Global ETD Search

71	Data Masking, Encryption, and their Effect on Classification Performance: Trade-offs Between Data Security and Utility Asenjo, Juan C. 01 January 2017 (has links) As data mining increasingly shapes organizational decision-making, the quality of its results must be questioned to ensure trust in the technology. Inaccuracies can mislead decision-makers and cause costly mistakes. With more data collected for analytical purposes, privacy is also a major concern. Data security policies and regulations are increasingly put in place to manage risks, but these policies and regulations often employ technologies that substitute and/or suppress sensitive details contained in the data sets being mined. Data masking and substitution and/or data encryption and suppression of sensitive attributes from data sets can limit access to important details. It is believed that the use of data masking and encryption can impact the quality of data mining results. This dissertation investigated and compared the causal effects of data masking and encryption on classification performance as a measure of the quality of knowledge discovery. A review of the literature found a gap in the body of knowledge, indicating that this problem had not been studied before in an experimental setting. The objective of this dissertation was to gain an understanding of the trade-offs between data security and utility in the field of analytics and data mining. The research used a nationally recognized cancer incidence database, to show how masking and encryption of potentially sensitive demographic attributes such as patients’ marital status, race/ethnicity, origin, and year of birth, could have a statistically significant impact on the patients’ predicted survival. Performance parameters measured by four different classifiers delivered sizable variations in the range of 9% to 10% between a control group, where the select attributes were untouched, and two experimental groups where the attributes were substituted or suppressed to simulate the effects of the data protection techniques. In practice, this represented a corroboration of the potential risk involved when basing medical treatment decisions using data mining applications where attributes in the data sets are masked or encrypted for patient privacy and security concerns. Big Data Data Analytics Data Mining Encryption Knowledge Discovery Masking Computer Engineering Computer Sciences
72	Analýza reálných dat produktové redakce Alza.cz pomocí metod DZD / Analysis of real data from Alza.cz product department using methods of KDD Válek, Martin January 2014 (has links) This thesis deals with data analysis using methods of knowledge discovery in databases. The goal is to select appropriate methods and tools for implementation of a specific project based on real data from Alza.cz product department. Data analysis is performed by using association rules and decision rules in the Lisp-Miner and decision trees in the RapidMiner. The methodology used is the CRISP-DM. The thesis is divided into three main sections. First section is focused on the theoretical summary of information about KDD. There are defined basic terms and described the types of tasks and methods of KDD. In the second section is introduced the methodology CRISP-DM. The practical part firstly introduces company Alza.cz and its goals for this task. Afterwards, the basic structure of the data and preparation for the next step (data mining) is described. In conclusion, the results are evaluated and the possibility of their use is outlined.
73	Reálná úloha dobývání znalostí / The Real Knowledge Discovery Task Kolafa, Ondřej January 2012 (has links) The major objective of this thesis is to perform a real data mining task of classifying term deposit accounts holders. For this task an anonymous bank customers with low funds position data are used. In correspondence with CRISP-DM methodology the work is guided through these steps: business understanding, data understanding, data preparation, modeling, evaluation and deployment. The RapidMiner application is used for modeling. Methods and procedures used in actual task are described in theoretical part. Basic concepts of data mining with special respect to CRM segment was introduced as well as CRISP-DM methodology and technics suitable for this task. A difference in proportions of long term accounts holders and non-holders enforced data set had to be balanced in favour of holders. At the final stage, there are twelve models built. According to chosen criterias (area under curve and f-measure) two best models (logistic regression and bayes network) were elected. In the last stage of data mining process a possible real-world utilisation is mentioned. The task is developed only in form of recommendations, because it can't be applied to the real situation.
74	Implementace procedur pro předzpracování dat v systému Rapid Miner / Implementation of data preparation procedures for RapidMiner Černý, Ján January 2014 (has links) Knowledge Discovery in Databases (KDD) is gaining importance with the rising amount of data being collected lately, despite this analytic software systems often provide only the basic and most used procedures and algorithms. The aim of this thesis is to extend RapidMiner, one of the most frequently used systems, with some new procedures for data preprocessing. To understand and develop the procedures, it is important to be acquainted with the KDD, with emphasis on the data preparation phase. It's also important to describe the analytical procedures themselves. To be able to develop an extention for Rapidminer, its needed to get acquainted with the process of creating the extention and the tools that are used. Finally, the resulting extension is introduced and tested.
75	Computação Evolutiva para a Construção de Regras de Conhecimento com Propriedades Específicas / Evolutionary Computing for Knowledge Rule Construction with Specific Properties Adriano Donizete Pila 12 April 2007 (has links) A maioria dos algoritmos de aprendizado de máquina simbólico utilizam regras de conhecimento if-then como linguagem de descrição para expressar o conhecimento aprendido. O objetivo desses algoritmos é encontrar um conjunto de regras de classificação que possam ser utilizadas na predição da classe de novos casos que não foram vistos a priori pelo algoritmo. Contudo, este tipo de algoritmo considera o problema da interação entre as regras, o qual consiste na avaliação da qualidade do conjunto de regras induzidas (classificador) como um todo, ao invés de avaliar a qualidade de cada regra de forma independente. Assim, como os classificadores têm por objetivo uma boa precisão nos casos não vistos, eles tendem a negligenciar outras propriedades desejáveis das regras de conhecimento, como a habilidade de causar surpresa ou trazer conhecimento novo ao especialista do domínio. Neste trabalho, estamos interessados em construir regras de conhecimento com propriedades específicas de forma isolada, i.e. sem considerar o problema da interação entre as regras. Para esse fim, propomos uma abordagem evolutiva na qual cada individuo da população do algoritmo representa uma única regra e as propriedades específicas são codificadas como medidas de qualidade da regra, as quais podem ser escolhidas pelo especialista do domínio para construir regras com as propriedades desejadas. O algoritmo evolutivo proposto utiliza uma rica estrutura para representar os indivíduos (regras), a qual possibilita considerar uma grande variedade de operadores evolutivos. O algoritmo utiliza uma função de aptidão multi-objetivo baseada em ranking que considera de forma concomitante mais que uma medida de avaliação de regra, transformando-as numa função simples-objetivo. Como a avaliação experimental é fundamental neste tipo de trabalho, para avaliar nossa proposta foi implementada a Evolutionary Computing Learning Environment --- ECLE --- que é uma biblioteca de classes para executar e avaliar o algoritmo evolutivo sob diferentes cenários. Além disso, a ECLE foi implementada considerando futuras implementações de novos operadores evolutivos. A ECLE está integrada ao projeto DISCOVER, que é um projeto de pesquisa em desenvolvimento em nosso laboratório para a aquisição automática de conhecimento. Analises experimentais do algoritmo evolutivo para construir regras de conhecimento com propriedades específicas, o qual pode ser considerado uma forma de análise inteligente de dados, foram realizadas utilizando a ECLE. Os resultados mostram a adequabilidade da nossa proposta / Most symbolic machine learning approaches use if-then know-ledge rules as the description language in which the learned knowledge is expressed. The aim of these learners is to find a set of classification rules that can be used to predict new instances that have not been seen by the learner before. However, these sorts of learners take into account the rule interaction problem, which consists of evaluating the quality of the set of rules (classifier) as a whole, rather than evaluating the quality of each rule in an independent manner. Thus, as classifiers aim at good precision to classify unseen instances, they tend to neglect other desirable properties of knowledge rules, such as the ability to cause surprise or bring new knowledge to the domain specialist. In this work, we are interested in building knowledge rules with specific properties in an isolated manner, i.e. not considering the rule interaction problem. To this end, we propose an evolutionary approach where each individual of the algorithm population represents a single rule and the specific properties are encoded as rule quality measure, a set of which can be freely selected by the domain specialist. The proposed evolutionary algorithm uses a rich structure for individual representation which enables one to consider a great variety of evolutionary operators. The algorithm uses a ranking-based multi-objective fitness function that considers more than one rule evaluation measure concomitantly into a single objective. As experimentation plays an important role in this sort of work, in order to evaluate our proposal we have implemented the Evolutionary Computing Learning Environment --- ECLE --- which is a framework to evaluate the evolutionary algorithm in different scenarios. Furthermore, the ECLE has been implemented taking into account future development of new evolutionary operators. The ECLE is integrated into the DISCOVER project, a major research project under constant development in our laboratory for automatic knowledge acquisition and analysis. Experimental analysis of the evolutionary algorithm to construct knowledge rules with specific properties, which can also be considered an important form of intelligent data analysis, was carried out using ECLE. Results show the suitability of our proposal Computação evolutiva Descoberta de conhecimento Regras de conhecimento Evolutionary computing Knowledge discovery Knowledge rules
76	O processo de extração de conhecimento de base de dados apoiado por agentes de software. / The process of knowledge discovery in databases supported by software agents. Robson Butaca Taborelli de Oliveira 01 December 2000 (has links) Os sistemas de aplicações científicas e comerciais geram, cada vez mais, imensas quantidades de dados os quais dificilmente podem ser analisados sem que sejam usados técnicas e ferramentas adequadas de análise. Além disso, muitas destas aplicações são voltadas para Internet, ou seja, possuem seus dados distribuídos, o que dificulta ainda mais a realização de tarefas como a coleta de dados. A área de Extração de Conhecimento de Base de Dados diz respeito às técnicas e ferramentas usadas para descobrir automaticamente conhecimento embutido nos dados. Num ambiente de rede de computadores, é mais complicado realizar algumas das etapas do processo de KDD, como a coleta e processamento de dados. Dessa forma, pode ser feita a utilização de novas tecnologias na tentativa de auxiliar a execução do processo de descoberta de conhecimento. Os agentes de software são programas de computadores com propriedades, como, autonomia, reatividade e mobilidade, que podem ser utilizados para esta finalidade. Neste sentido, o objetivo deste trabalho é apresentar a proposta de um sistema multi-agente, chamado Minador, para auxiliar na execução e gerenciamento do processo de Extração de Conhecimento de Base de Dados. / Nowadays, commercial and scientific application systems generate huge amounts of data that cannot be easily analyzed without the use of appropriate tools and techniques. A great number of these applications are also based on the Internet which makes it even more difficult to collect data, for instance. The field of Computer Science called Knowledge Discovery in Databases deals with issues of the use and creation of the tools and techniques that allow for the automatic discovery of knowledge from data. Applying these techniques in an Internet environment can be particulary difficult. Thus, new techniques need to be used in order to aid the knowledge discovery process. Software agents are computer programs with properties such as autonomy, reactivity and mobility that can be used in this way. In this context, this work has the main goal of presenting the proposal of a multiagent system, called Minador, aimed at supporting the execution and management of the Knowledge Discovery in Databases process. agentes KDD mineração de dados sistema multiagentes agents data mining knowledge discovery in databases multi-agents system
77	Geração automática de metadados: uma contribuição para a Web semântica. / Automatic metadata generation: a contribution to the semantic Web. Ferreira, Eveline Cruz Hora Gomes 05 April 2006 (has links) Esta Tese oferece uma contribuição na área de Web Semântica, no âmbito da representação e indexação de documentos, definindo um Modelo de geração automática de metadados baseado em contexto, a partir de documentos textuais na língua portuguesa, em formato não estruturado (txt). Um conjunto teórico amplo de assuntos ligados à criação de ambientes digitais semântico também é apresentado. Conforme recomendado em SemanticWeb.org, os documentos textuais aqui estudados foram automaticamente convertidos em páginas Web anotadas semanticamente, utilizando o Dublin Core como padrão para definição dos elementos de metadados, e o padrão RDF/XML para representação dos documentos e descrição dos elementos de metadados. Dentre os quinze elementos de metadados Dublin Core, nove foram gerados automaticamente pelo Modelo, e seis foram gerados de forma semi-automática. Os metadados Description e Subject foram os que necessitaram de algoritmos mais complexos, sendo obtidos através de técnicas estatísticas, de mineração de textos e de processamento de linguagem natural. A finalidade principal da avaliação do Modelo foi verificar o comportamento dos documentos convertidos para o formato RDF/XML, quando estes foram submetidos a um processo de recuperação de informação. Os elementos de metadados Description e Subject foram exaustivamente avaliados, uma vez que estes são os principais responsáveis por apreender a semântica de documentos textuais. A diversidade de contextos, a complexidade dos problemas relativos à língua portuguesa, e os novos conceitos introduzidos pelos padrões e tecnologias da Web Semântica, foram alguns dos fortes desafios enfrentados na construção do Modelo aqui proposto. Apesar de se ter utilizado técnicas não muito novas para a exploração dos conteúdos dos documentos, não se pode ignorar que os elementos inovadores introduzidos pela Web Semântica ofereceram avanços que possibilitaram a obtenção de resultados importantes nesta Tese. Como demonstrado aqui, a junção dessas técnicas com os padrões e tecnologias recomendados pela Web Semântica pode minimizar um dos maiores problemas da Web atual, e uma das fortes razões para a implementação da Web Semântica: a tendência dos mecanismos de busca de inundarem os usuários com resultados irrelevantes, por não levarem em consideração o contexto específico desejado pelo usuário. Dessa forma, é importante que se dê continuidade aos estudos e pesquisas em todas as áreas relacionadas à implementação da Web Semântica, dando abertura para que sistemas de informação mais funcionais sejam projetados / This Thesis offers a contribution to the Semantic Web area, in the scope of the representation and indexing of documents, defining an Automatic metadata generation model based on context, starting from textual documents not structured in the Portuguese language. A wide theoretical set of subjects related to the creation of semantic digital environments is also presented. As recommended in SemanticWeb.org, the textual documents studied here were automatically converted to Web pages written in semantic format, using Dublin Core as standard for definition of metadata elements, and the standard RDF/XML for representation of documents and description of the metadata elements. Among the fifteen Dublin Core metadata elements, nine were automatically generated by the Model, and six were generated in a semiautomatic manner. The metadata Description and Subject were the ones that required more complex algorithms, being obtained through statistical techniques, text mining techniques and natural language processing. The main purpose of the evaluation of the Model was to verify the behavior of the documents converted to the format RDF/XML, when these were submitted to an information retrieval process. The metadata elements Description and Subject were exhaustively evaluated, since these are the main ones responsible for learning the semantics of textual documents. The diversity of contexts, the complexity of the problems related to the Portuguese language, and the new concepts introduced by the standards and technologies of the Semantic Web, were some of the great challenges faced in the construction of the Model here proposed. In spite of having used techniques which are not very new for the exploration and exploitation of the contents of the documents, we cannot ignore that the innovative elements introduced by the Web Semantic have offered improvements that made possible the obtention of important results in this Thesis. As demonstrated here, the joining of those techniques with the standards and technologies recommended by the Semantic Web can minimize one of the largest problems of the current Web, and one of the strong reasons for the implementation of the Semantic Web: the tendency of the search mechanisms to flood the users with irrelevant results, because they do not take into account the specific context desired by the user. Therefore, it is important that the studies and research be continued in all of the areas related to the Semantic Web?s implementation, opening the door for more functional systems of information to be designed. Biblioteca digital Descoberta de conhecimento Digital library Information recovery Knowledge discovery Recuperação da informação Semântica Semantics
78	A situation refinement model for complex event processing Alakari, Alaa A. 07 January 2021 (has links) Complex Event Processing (CEP) systems aim at processing large flows of events to discover situations of interest (SOI). Primarily, CEP uses predefined pattern templates to detect occurrences of complex events in an event stream. Extracting complex event is achieved by employing techniques such as filtering and aggregation to detect complex patterns of many simple events. In general, CEP systems rely on domain experts to de fine complex pattern rules to recognize SOI. However, the task of fine tuning complex pattern rules in the event streaming environment face two main challenges: the issue of increased pattern complexity and the event streaming constraints where such rules must be acquired and processed in near real-time. Therefore, to fine-tune the CEP pattern to identify SOI, the following requirements must be met: First, a minimum number of rules must be used to re fine the CEP pattern to avoid increased pattern complexity, and second, domain knowledge must be incorporated in the refinement process to improve awareness about emerging situations. Furthermore, the event data must be processed upon arrival to cope with the continuous arrival of events in the stream and to respond in near real-time. In this dissertation, we present a Situation Refi nement Model (SRM) that considers these requirements. In particular, by developing a Single-Scan Frequent Item Mining algorithm to acquire the minimal number of CEP rules with the ability to adjust the level of re refinement to t the applied scenario. In addition, a cost-gain evaluation measure to determine the best tradeoff to identify a particular SOI is presented. / Graduate Complex Event Processing Situational Awareness Event Stream Processing Real time data mining Situation Refinement Knowledge Discovery
79	Rough Sets Bankruptcy Prediction Models Versus Auditor Signalling Rates McKee, Thomas E. 01 December 2003 (has links) Rough set prediction capability was compared with actual auditor signaling rates for a large sample of United States companies from 1991 to 1997 time period. Prior bankruptcy prediction research was carefully reviewed to identify 11 possible predictive factors which had both significant theoretical support and were present in multiple studies. Rough sets theory was used to develop two different bankruptcy prediction models, each containing four variables from the 11 possible predictive variables. In contrast with prior rough sets theory research which suggested that rough sets theory offered significant bankruptcy predictive improvements for auditors, the rough sets models did not provide any significant comparative advantage with regard to prediction accuracy over the actual auditors' methodologies. bankruptcy prediction corporate failure going-concern knowledge discovery machine learning rough sets
80	Získávání znalostí z databází / Knowledge Data Discovery Melichar, Ladislav Unknown Date (has links) The data mining is still little investigated area. This project is aimed firstly generally to the knowledge discovery from the structured data, especially from the datas in XML format. Furthermore the tree algorithm HybridTreeMiner is presented here with aim of its application for the knowledge discovery from XML documents. The practical part of this project is dedicated to the design of the conception for the algorithm integration to the mining system developed in FIT. This system is implemented in the programming language Java, it has modular structure and its parts communicate each other by means of the language DMSL. Reached results are presented and discussed in the end.

Search results