Global ETD Search

261	Segmentación de los Contribuyentes que Declaran IVA Utilizando Técnicas de Data Mining Lückeheide Codjambassis, Sandra January 2007 (has links) No description available. Ingeniería Data mining Segmentación Clustering
262	Using Machine Learning and Text Mining Algorithms to Facilitate Research Discovery of Plant Food Metabolomics and Its Application for Human Health Benefit Targets Mathew, Jithin Jose January 2020 (has links) With the increase in scholarly articles published every day, the need for an automated systematic exploratory literature review tool is rising. With the advance in Text Mining and Machine Learning methods, such data exploratory tools are researched and developed in every scientific domain. This research aims at finding the best keyphrase extraction algorithm and topic modeling algorithm that is going to be the foundation and main component of a tool that will aid in Systematic Literature Review. Based on experimentation on a set of highly relevant scholarly articles published in the domain of food science, two graph-based keyphrase extraction algorithms, TopicalPageRank and PositionRank were picked as the best two algorithms among 9 keyphrase extraction algorithms for picking domain-specific keywords. Among the two topic modeling algorithms, Latent Dirichlet Assignment (LDA) and Non-zero Matrix Factorization (NMF), documents chosen in this research were best classified into suitable topics by the NMF method validated by a domain expert. This research lays the framework for a faster tool development for Systematic Literature Review. data mining machine learning textmining
263	Propuesta de arquitectura empresarial para la integración de procesos de soporte de la operación minera utilizando como marco de referencia Togaf Guicela Gavidia, Rodolfo Alejos 09 February 2016 (has links) El presente trabajo es una propuesta de implementación de una arquitectura empresarial para la optimización de procesos de soporte de la operación minera utilizando el marco de referencia TOGAF. Las tecnologías de la información deben formar parte de los planes estratégicos de las organizaciones actuales, para identificar y mejorar los aspectos claves de valor y eficiencia en los procesos principales del negocio y así poder afrontar cualquier crisis en el mercado financiero y ser competitivos frente a otras empresas del mismo rubro. La empresa seleccionada para la presente propuesta es Hochschild Mining plc, una de las principales empresas productoras de plata primaria a nivel mundial. Hochschild Mining plc es una empresa peruana, con más de 104 años de operación en el rubro de la minería de metales preciosos (oro y plata). Su principal accionista es el empresario peruano Eduardo Hochschild Beeck. Actualmente tiene operaciones en Perú, México, Argentina, Chile y su sede principal se encuentra en Lima-Perú. El alcance de la presente propuesta son las operaciones de Perú. La empresa siempre ha gestionado un presupuesto generoso para sus principales procesos mineros sin restricción alguna, sin embargo, el sector viene atravesando un momento coyuntural negativo a nivel mundial, producto de la caída del precio de los metales. Esto obliga a la empresa a tomar medidas de austeridad y reducción de costos en las operaciones, que no sólo consideró renegociar con proveedores, restricción de algunos servicios sino también desvinculación de trabajadores. Para el período 2014 – 2015 llegó a alcanzar una reducción en costos del orden de los 300 millones de dólares, debido a la optimización de los procesos de la operación, mina y geología; procesos que cuentan con el mayor presupuesto de la empresa tanto en OPEX como en CAPEX. Sin embargo, no se evalúo el impacto que podría tener en sus operaciones mineras una mayor eficiencia teniendo procesos de soporte optimizados e integrados. Se identificó una oportunidad de mejora, optimizando y estandarizando los procesos de soporte de la operación los cuales significaban el 15 % del presupuesto total de la operación. Estos procesos tenían deficiencias en integración, optimización, y en su mayoría son manuales, induciendo en muchas ocasiones a cometer errores que se traducían en gastos y pago de multas a entes reguladores. Para la presente propuesta se consideran como modelo dos procesos de soporte administrativo, considerados los más críticos e importantes por su interacción con personal humano y por los reportes y controles que tienen con los entes fiscalizadores y reguladores del sector. (Gestión Alimentaria, Gestión de Seguridad Industrial) Se ha realizado el análisis del proyecto considerando costos, tiempos y riesgos, así como los beneficios esperados al culminar el proyecto observándose que la inversión necesaria es recuperada en corto tiempo por los ahorros que se tendrán. Finalmente, se recomienda mantener un modelo de mejora continua, para lo cual se debe monitorear y medir constantemente el estado de los procesos para poder proponer nuevas mejoras. / Tesis TOGAF Integración de procesos Data mining
264	Estimation techniques for advanced database applications Peng, Yun 01 January 2013 (has links) No description available. Data mining Databases Estimation theory
265	MACHINE LEARNING ALGORITHM PERFORMANCE OPTIMIZATION: SOLVING ISSUES OF BIG DATA ANALYSIS Sohangir, Soroosh 01 December 2015 (has links) (PDF) Because of high complexity of time and space, generating machine learning models for big data is difficult. This research is introducing a novel approach to optimize the performance of learning algorithms with a particular focus on big data manipulation. To implement this method a machine learning platform using eighteen machine learning algorithms is implemented. This platform is tested using four different use cases and result is illustrated and analyzed. Big Data Data Mining Optimization
266	Quantization of Real-Valued Attributes for Data Mining Qaiser, Elizae 11 October 2001 (has links) No description available. Computer Science data mining quantization
267	Towards Algorithm Transformation for Temporal Data Mining on GPU Ponce, Sean Philip 18 August 2009 (has links) Data Mining allows one to analyze large amounts of data. With increasing amounts of data being collected, more computing power is needed to mine these larger and larger sums of data. The GPU is an excellent piece of hardware with a compelling price to performance ratio and has rapidly risen in popularity. However, this increase in speed comes at a cost. The GPU's architecture executes non-data parallel code with either marginal speedup or even slowdown. The type of data mining we examine, temporal data mining, uses a Â¯nite state machine (FSM), which is non-data parallel. We contribute the concept of algorithm transformation for increasing the data parallelism of an algorithm. We apply the algorithm transformation process to the problem of temporal data mining which solves the same problem as the FSM-based algorithm, but is data parallel. The new GPU implementation shows a 6x speedup over the best CPU implementation and 11x speedup over a previous GPU implementation. / Master of Science CUDA GPGPU temporal data mining
268	Optimizing and Understanding Network Structure for Diffusion Zhang, Yao 16 October 2017 (has links) Given a population contact network and electronic medical records of patients, how to distribute vaccines to individuals to effectively control a flu epidemic? Similarly, given the Twitter following network and tweets, how to choose the best communities/groups to stop rumors from spreading? How to find the best accounts that bridge celebrities and ordinary users? These questions are related to diffusion (aka propagation) phenomena. Diffusion can be treated as a behavior of spreading contagions (like viruses, ideas, memes, etc.) on some underlying network. It is omnipresent in areas such as social media, public health, and cyber security. Examples include diseases like flu spreading on person-to-person contact networks, memes disseminating by online adoption over online friendship networks, and malware propagating among computer networks. When a contagion spreads, network structure (like nodes/edges/groups, etc.) plays a major role in determining the outcome. For instance, a rumor, if propagated by celebrities, can go viral. Similarly, an epidemic can die out quickly, if vulnerable demographic groups are successfully targeted for vaccination. Hence in this thesis, we aim to optimize and understand network structure better in light of diffusion. We optimize graph topologies by removing nodes/edges for controlling rumors/viruses from spreading, and gain a deeper understanding of a network in terms of diffusion by exploring how nodes group together for similar roles of dissemination. We develop several novel graph mining algorithms, with different levels of granularity (node/edge level to group/community level), from model-driven and data-driven perspectives, focusing on topics like immunization on networks, graph summarization, and community detection. In contrast to previous work, we are the first to systematically develops more realistic, implementable and data-based graph algorithms to control contagions. In addition, our thesis is also the first work to use diffusion to effectively summarize graphs and understand communities/groups of networks in a general way. 1. Model-driven. Diffusion processes are usually described using mathematical models, e.g., the Independent Cascade (IC) model in social media, and the Susceptible-Infectious-Recovered (SIR) model in epidemiology. Given such models, we propose to optimize network structure for controlling propagation (the immunization problem) in several practical and implementable settings, taking into account the presence of infections, the uncertain nature of the data and group structure of the population. We develop efficient algorithms for different interventions, such as vaccination (node removal) and quarantining (edge removal). In addition, we study the graph coarsening problem for both static and temporal networks to obtain a better understanding of relations among nodes when a contagion is propagating. We seek to get a much smaller representation of a large network, while preserving its diffusive properties. 2. Data-driven. Model-driven approaches can provide ideal results if underlying diffusion models are given. However, in many situations, diffusion processes are very complicated, and it is challenging or even impossible to pick the most suited model to describe them. In addition, rapid technological development has provided an abundance of data such as tweets and electronic medical records. Hence, in the second part of the thesis, we explore data-driven approaches for diffusion in networks, which can directly work on propagation data by relaxing modeling assumptions of diffusion. To be specific, we first develop data-driven immunization strategies to stop rumors or allocate vaccines by optimizing network topologies, using large-scale national-level diagnostic patient data with billions of flu records. Second, we propose a novel community detection problem to discover "bridge" and "celebrity" communities from social media data, and design case studies to understand roles of nodes/communities using diffusion. Our work has many applications in multiple areas such as epidemiology, sociology and computer science. For example, our work on efficient immunization algorithms, such as data-driven immunization, can help CDC better allocate vaccines to control flu epidemics in major cities. Similarly, in social media, our work on understanding network structure using diffusion can lead to better community discovery, such as finding media accounts that can boost tweet promotions in Twitter. / Ph. D. / In public health, how to distribute vaccines to effectively control an epidemic like flu over population? In social media, how to identify different roles of users who participate in the spread Of content through social networks? These questions and many others are related to diffusion (aka propagation) phenomena in networks (aka graphs). Networks, as natural structures to model relations between objects, arise in many areas, such as online social networks, population contact network, and the Internet. Diffusion can be treated as a behavior of spreading contagions (like viruses, ideas, memes, etc.) on some underlying network. It is also prevalent: e.g., diseases like flu spreading on person-to-person contact networks, memes disseminating by online adoption over online friendship networks, and malware propagating among computer networks. When a contagion spreads, network structure (like nodes/edges/groups, etc.) plays a major role in determining the outcome. For instance, a rumor, if propagated by celebrities, can go viral. Similarly, an epidemic can die out quickly, if vulnerable demographic groups are successfully targeted for vaccination. This thesis targets at general audience and provides a comprehensive study on how to optimize and understand network structure better in light of diffusion. We optimize graph topologies by removing nodes/edges for controlling rumors/viruses from spreading, and gain a deeper understanding of a network in terms of diffusion by exploring how nodes group together for similar roles of dissemination. In contrast to previous work, we are the first to systematically develops more realistic, implementable and data-based graph algorithms to control contagions. In addition, our thesis is also the first work to use diffusion to effectively summarize graphs and understand communities/groups of networks in a general way. Our work has many applications in multiple areas such as epidemiology, sociology and computer science. For example, our work on efficient immunization algorithms, such as data-driven immunization, can help experts better allocate vaccines to control flu epidemics. Similarly, in social media, our work on understanding network structure using diffusion can lead to better community discovery, such as finding media accounts that can boost tweet promotions in Twitter. Data Mining Graph/Network Diffusion
269	A visualization framework for exploring correlations among atributes of a large dataset and its applications in data mining Techaplahetvanich, Kesaraporn January 2007 (has links) [Truncated abstract] Many databases in scientific and business applications have grown exponentially in size in recent years. Accessing and using databases is no longer a specialized activity as more and more ordinary users without any specialized knowledge are trying to gain information from databases. Both expert and ordinary users face significant challenges in understanding the information stored in databases. The databases are so large in most cases that it is impossible to gain useful information by inspecting data tables, which are the most common form of storing data in relational databases. Visualization has emerged as one of the most important techniques for exploring data stored in large databases. Appropriate visualization techniques can reveal trends, correlations and associations in data that are very difficult to understand from a textual representation of the data. This thesis presents several new frameworks for data visualization and visual data mining. The first technique, VisEx, is useful for visual exploration of large multi-attribute datasets and especially for exploring the correlations among the attributes in such datasets. Most previous visualization techniques can display correlations among two or three attributes at a time without excessive screen clutter. ... Although many algorithms for mining association rules have been researched extensively, they do not incorporate users in the process and most of them generate a large number of association rules. It is quite often difficult for the user to analyze a large number of rules to identify a small subset of rules that is of importance to the user. In this thesis I present a framework for the user to interactively mine association rules visually. Another challenging task in data mining is to understand the correlations among the mined association rules. It is often difficult to identify a relevant subset of association rules from a large number of mined rules. A further contribution of this thesis is a simple framework in the VisAR system that allows the user to explore a large number of association rules visually. A variety of businesses have adopted new technologies for storing large amounts of data. Analysis of historical data quite often offers new insights into business processes that may increase productivity and profit. On-line analytical processing (OLAP) has become a powerful tool for business analysts to explore historical data. Effective visualization techniques are very important for supporting OLAP technology. A new technique for the visual exploration of OLAP data cubes is also presented in this thesis. Information visualization Database management Data mining Information visualization Data mining Online analytical processing Visual data mining
270	An Architecture For High-performance Privacy-preserving And Distributed Data Mining Secretan, James 01 January 2009 (has links) This dissertation discusses the development of an architecture and associated techniques to support Privacy Preserving and Distributed Data Mining. The field of Distributed Data Mining (DDM) attempts to solve the challenges inherent in coordinating data mining tasks with databases that are geographically distributed, through the application of parallel algorithms and grid computing concepts. The closely related field of Privacy Preserving Data Mining (PPDM) adds the dimension of privacy to the problem, trying to find ways that organizations can collaborate to mine their databases collectively, while at the same time preserving the privacy of their records. Developing data mining algorithms for DDM and PPDM environments can be difficult and there is little software to support it. In addition, because these tasks can be computationally demanding, taking hours of even days to complete data mining tasks, organizations should be able to take advantage of high-performance and parallel computing to accelerate these tasks. Unfortunately there is no such framework that is able to provide all of these services easily for a developer. In this dissertation such a framework is developed to support the creation and execution of DDM and PPDM applications, called APHID (Architecture for Private, High-performance Integrated Data mining). The architecture allows users to flexibly and seamlessly integrate cluster and grid resources into their DDM and PPDM applications. The architecture is scalable, and is split into highly de-coupled services to ensure flexibility and extensibility. This dissertation first develops a comprehensive example algorithm, a privacy-preserving Probabilistic Neural Network (PNN), which serves a basis for analysis of the difficulties of DDM/PPDM development. The privacy-preserving PNN is the first such PNN in the literature, and provides not only a practical algorithm ready for use in privacy-preserving applications, but also a template for other data intensive algorithms, and a starting point for analyzing APHID's architectural needs. After analyzing the difficulties in the PNN algorithm's development, as well as the shortcomings of researched systems, this dissertation presents the first concrete programming model joining high performance computing resources with a privacy preserving data mining process. Unlike many of the existing PPDM development models, the platform of services is language independent, allowing layers and algorithms to be implemented in popular languages (Java, C++, Python, etc.). An implementation of a PPDM algorithm is developed in Java utilizing the new framework. Performance results are presented, showing that APHID can enable highly simplified PPDM development while speeding up resource intensive parts of the algorithm. Data Mining Privacy Preserving Data Mining Distributed Data Mining High-Performance Computing Computer Engineering Engineering

Search results