Global ETD Search

1	Seleção de atributos em agrupamento de dados utilizando algoritmos evolutivos / Feature subset selection in data clustering using evolutionary algorithm Martarelli, Nádia Junqueira 03 August 2016 (has links) Com o surgimento da tecnologia da informação, o processo de análise e interpretação de dados deixou de ser executado exclusivamente por seres humanos, passando a contar com auxílio computacional para a descoberta de conhecimento em grandes bancos de dados. Este auxílio exige uma organização e ordenação das atividades, antes manualmente exercidas, em um processo composto de três grandes etapas. A primeira etapa deste processo conta com uma tarefa de redução da dimensionalidade, que tem como objetivo a eliminação de atributos que não contribuem para a análise dos dados, resultando portanto, na seleção de um subconjunto dos atributos originais. A seleção de um subconjunto de atributos pode ser encarada como um problema de busca, já que há inúmeras possibilidades de combinação dos atributos originais em subconjuntos. Dessa forma, uma das estratégias de busca que pode ser adotada consiste na busca randômica, executada por um algoritmo genético ou pelas suas variações. Este trabalho propõe a aplicação de duas variações do algoritmo genético, Algoritmo Genético Construtivo e Algoritmo Genético Enviesado com Chave Aleatória, no problema de seleção de atributos em agrupamento de dados, já que estas duas variações ainda não foram aplicadas em tal problema. A fim de verificar o desempenho destas duas variações, comparou-se ambas com a abordagem tradicional do algoritmo genético. Efetuou-se também a comparação entre as duas variações. Para isto, foi utilizada três bases de dados retiradas do repositório UCI de aprendizado de máquinas. Os resultados obtidos mostraram que os desempenhos, em termos de qualidade da solução, dos algoritmos: genético construtivo e genético enviesado com chave aleatório foram melhores, de maneira geral, do que o desempenho da abordagem tradicional. Constatou-se também diferença significativa em termos de eficiência entre as duas variações e a abordagem tradicional. / With the advent of information technology, the process of analysis and interpretation of data left to be run exclusively by humans, going to rely on computational support for knowledge discovery in large databases. This aid requires an organization and sequencing of activities before manually performed in a compound of three major step process. The first step of this process has a reduced dimensionality task, which aims to eliminate attributes that do not contribute to the data analysis, resulting therefore, in selecting a subset of the original attributes. Selecting a subset of attributes can be viewed as a search problem, since there are numerous possible combinations of unique attributes into subsets. Thus, one search strategies that can be adopted is to randomly search, performed by a genetic algorithm or its variants. This paper proposes the application of two variations of the genetic algorithm, Constructive Genetic Algorithm and Biased Random Key Genetic Algorithm in the feature selection problem in data grouping, as these two variations have not been applied in such a problem. In order to verify the performance of the two variations, we compare them with the traditional algorithm, genetic algorithm. It was also executed the comparison between the two variations. For this, we used three databases removed from the UCI repository of machine learning. The results showed that the performance, in term of quality solution, of algorithms: genetic constructive and genetic biased with random key are better than the performance of the traditional approach. It was also observed a significant difference in efficiency between of the two variations and the traditional approach. Agrupamento de dados Algoritmos evolutivos Algoritmos genéticos Clustering data Evolutionary algorithm Feature subset selection Genetic algorithms Seleção de atributos
2	Seleção de atributos em agrupamento de dados utilizando algoritmos evolutivos / Feature subset selection in data clustering using evolutionary algorithm Nádia Junqueira Martarelli 03 August 2016 (has links) Com o surgimento da tecnologia da informação, o processo de análise e interpretação de dados deixou de ser executado exclusivamente por seres humanos, passando a contar com auxílio computacional para a descoberta de conhecimento em grandes bancos de dados. Este auxílio exige uma organização e ordenação das atividades, antes manualmente exercidas, em um processo composto de três grandes etapas. A primeira etapa deste processo conta com uma tarefa de redução da dimensionalidade, que tem como objetivo a eliminação de atributos que não contribuem para a análise dos dados, resultando portanto, na seleção de um subconjunto dos atributos originais. A seleção de um subconjunto de atributos pode ser encarada como um problema de busca, já que há inúmeras possibilidades de combinação dos atributos originais em subconjuntos. Dessa forma, uma das estratégias de busca que pode ser adotada consiste na busca randômica, executada por um algoritmo genético ou pelas suas variações. Este trabalho propõe a aplicação de duas variações do algoritmo genético, Algoritmo Genético Construtivo e Algoritmo Genético Enviesado com Chave Aleatória, no problema de seleção de atributos em agrupamento de dados, já que estas duas variações ainda não foram aplicadas em tal problema. A fim de verificar o desempenho destas duas variações, comparou-se ambas com a abordagem tradicional do algoritmo genético. Efetuou-se também a comparação entre as duas variações. Para isto, foi utilizada três bases de dados retiradas do repositório UCI de aprendizado de máquinas. Os resultados obtidos mostraram que os desempenhos, em termos de qualidade da solução, dos algoritmos: genético construtivo e genético enviesado com chave aleatório foram melhores, de maneira geral, do que o desempenho da abordagem tradicional. Constatou-se também diferença significativa em termos de eficiência entre as duas variações e a abordagem tradicional. / With the advent of information technology, the process of analysis and interpretation of data left to be run exclusively by humans, going to rely on computational support for knowledge discovery in large databases. This aid requires an organization and sequencing of activities before manually performed in a compound of three major step process. The first step of this process has a reduced dimensionality task, which aims to eliminate attributes that do not contribute to the data analysis, resulting therefore, in selecting a subset of the original attributes. Selecting a subset of attributes can be viewed as a search problem, since there are numerous possible combinations of unique attributes into subsets. Thus, one search strategies that can be adopted is to randomly search, performed by a genetic algorithm or its variants. This paper proposes the application of two variations of the genetic algorithm, Constructive Genetic Algorithm and Biased Random Key Genetic Algorithm in the feature selection problem in data grouping, as these two variations have not been applied in such a problem. In order to verify the performance of the two variations, we compare them with the traditional algorithm, genetic algorithm. It was also executed the comparison between the two variations. For this, we used three databases removed from the UCI repository of machine learning. The results showed that the performance, in term of quality solution, of algorithms: genetic constructive and genetic biased with random key are better than the performance of the traditional approach. It was also observed a significant difference in efficiency between of the two variations and the traditional approach. Agrupamento de dados Algoritmos evolutivos Algoritmos genéticos Seleção de atributos Clustering data Evolutionary algorithm Feature subset selection Genetic algorithms
3	Patim: Proximity Aware Time Management Okutanoglu, Aydin 01 October 2008 (has links) (PDF) Logical time management is used to synchronize the executions of distributed simulation elements. In existing time management systems, such as High Level Architecture (HLA), logical times of the simulation elements are synchronized. However, in some cases synchronization can unnecessarily decrease the performance of the system. In the proposed HLA based time management mechanism, federates are clustered into logically related groups. The relevance of federates is taken to be a function of proximity which is defined as the distance between them in the virtual space. Thus, each federate cluster is composed of relatively close federates according to calculated distances. When federate clusters are sufficiently far from each other, there is no need to synchronize them, as they do not relate each other. So in PATiM mechanism, inter-cluster logical times are not synchronized when clusters are sufficiently distant. However, if the distant federate clusters get close to each other, they will need to resynchronize their logical times. This temporal partitioning is aimed at reducing network traffic and time management calculations and also increasing the concurrency between federates. The results obtained based on case applications have verified that clustering improves local performance as soon as federates become unrelated.
4	Agrupamento de curvas de carga para redução de bases de dados utilizadas na previsão de carga de curto prazo / Clustering of load profiles for short term load forecasting Muller, Marcos Ricardo 21 February 2014 (has links) Made available in DSpace on 2017-07-10T17:11:46Z (GMT). No. of bitstreams: 1 DISSERTACAO Marcos Muller2.pdf: 3169941 bytes, checksum: 9c51b1da2e6c3f07726daa30c819efbb (MD5) Previous issue date: 2014-02-21 / Fundação Parque Tecnológico Itaipu / This work presents the use of clustering techniques in load curves for the similar days method for load forecasting, in order to obtain a reduced data to achieve a faster computational algorithm, while achieving similar or superior performance compared to those obtained by the traditional method that makes use of the original data set. The method allows to perform similar day load forecasting using short-term historical data from the consumption of electricity at consumers level, and related data, which allow tracing analogies to a future day. Conventional implementations of the method are used for comparison and validation. The scenario that provides the data for the studies, as well as the equipment, and data preprocessing stage, are presented. The methodology is validated using the cluster silhoute analysis. With the MAPE values was possible to verify the forecast, indicating superiority of the method based on clustered load curves. / Este trabalho apresenta a utilização de clusterização de curvas de carga do nível menos agregado para o método de dias similares, com o objetivo de obter conjuntos reduzidos de dados que imponham menores cargas computacionais ao algoritmo de previsão, e permitir ainda, desempenhos similares ou superiores quando comparados aos obtidos pelo método de dias similares que faz uso do conjunto original de dados. O método de dias similares permite realizar previsão de carga de curtíssimo prazo a partir de dados históricos de consumo de energia elétrica, além de dados correlatos, que permitem traçar analogias com um dia futuro. Implementações convencionais do mesmo método são utilizadas para comparação de resultados. O cenário que fornece os dados para os estudos, assim como os equipamentos empregados e a etapa de pré-processamento de dados são apresentadas. A análise de silhuetas de cluster foi empregada com o objetivo de validar os agrupamentos. Por meio do cálculo do MAPE foi possível verificar a assertividade das previsões, indicando superioridade daquela baseada nas curvas de carga clusterizadas. Método de Dias Similares previsão de carga clusterização de dados medidores eletrônicos. Similar Days Method load forecasting clustering data electronic meters
5	Data Mining Methods For Clustering Power Quality Data Collected Via Monitoring Systems Installed On The Electricity Network Guder, Mennan 01 September 2009 (has links) (PDF) Increasing power demand and wide use of high technology power electronic devices result in need for power quality monitoring. The quality of electric power in both transmission and distribution systems should be analyzed in order to sustain power system reliability and continuity. This analysis is possible by examination of data collected by power quality monitoring systems. In order to define the characteristics of the power system and reveal the relations between the power quality events, huge amount of data should be processed. In this thesis, clustering methods for power quality events are developed using exclusive and overlapping clustering models. The methods are designed to cluster huge amount of power quality data which is obtained from the online monitoring of the Turkish Electricity Transmission System. The main issues considered in the design of the clustering methods are the amount of the data, efficiency of the designed algorithm and queries that should be supplied to the domain experts. This research work is fully supported by the Public Research grant Committee (KAMAG) of TUBITAK within the scope of National Power quality Project (105G129).
6	Multi-Agent User-Centric Specialization and Collaboration for Information Retrieval Mooman, Abdelniser January 2012 (has links) The amount of information on the World Wide Web (WWW) is rapidly growing in pace and topic diversity. This has made it increasingly difficult, and often frustrating, for information seekers to retrieve the content they are looking for as information retrieval systems (e.g., search engines) are unable to decipher the relevance of the retrieved information as it pertains to the information they are searching for. This issue can be decomposed into two aspects: 1) variability of information relevance as it pertains to an information seeker. In other words, different information seekers may enter the same search text, or keywords, but expect completely different results. It is therefore, imperative that information retrieval systems possess an ability to incorporate a model of the information seeker in order to estimate the relevance and context of use of information before presenting results. Of course, in this context, by a model we mean the capture of trends in the information seeker's search behaviour. This is what many researchers refer to as the personalized search. 2) Information diversity. Information available on the World Wide Web today spans multitudes of inherently overlapping topics, and it is difficult for any information retrieval system to decide effectively on the relevance of the information retrieved in response to an information seeker's query. For example, the information seeker who wishes to use WWW to learn about a cure for a certain illness would receive a more relevant answer if the search engine was optimized into such domains of topics. This is what is being referred to in the WWW nomenclature as a 'specialized search'. This thesis maintains that the information seeker's search is not intended to be completely random and therefore tends to portray itself as consistent patterns of behaviour. Nonetheless, this behaviour, despite being consistent, can be quite complex to capture. To accomplish this goal the thesis proposes a Multi-Agent Personalized Information Retrieval with Specialization Ontology (MAPIRSO). MAPIRSO offers a complete learning framework that is able to model the end user's search behaviour and interests and to organize information into categorized domains so as to ensure maximum relevance of its responses as they pertain to the end user queries. Specialization and personalization are accomplished using a group of collaborative agents. Each agent employs a Reinforcement Learning (RL) strategy to capture end user's behaviour and interests. Reinforcement learning allows the agents to evolve their knowledge of the end user behaviour and interests as they function to serve him or her. Furthermore, REL allows each agent to adapt to changes in an end user's behaviour and interests. Specialization is the process by which new information domains are created based on existing information topics, allowing new kinds of content to be built exclusively for information seekers. One of the key characteristics of specialization domains is the seeker centric - which allows intelligent agents to create new information based on the information seekers' feedback and their behaviours. Specialized domains are created by intelligent agents that collect information from a specific domain topic. The task of these specialized agents is to map the user's query to a repository of specific domains in order to present users with relevant information. As a result, mapping users' queries to only relevant information is one of the fundamental challenges in Artificial Intelligent (AI) and machine learning research. Our approach employs intelligent cooperative agents that specialize in building personalized ontology information domains that pertain to each information seeker's specific needs. Specializing and categorizing information into unique domains is one of the challenge areas that have been addressed and various proposed solutions were evaluated and adopted to address growing information. However, categorizing information into unique domains does not satisfy each individualized information seeker. Information seekers might search for similar topics, but each would have different interests. For example, medical information of a specific medical domain has different importance to both the doctor and patients. The thesis presents a novel solution that will resolve the growing and diverse information by building seeker centric specialized information domains that are personalized through the information seekers' feedback and behaviours. To address this challenge, the research examines the fundamental components that constitute the specialized agent: an intelligent machine learning system, user input queries, an intelligent agent, and information resources constructed through specialized domains. Experimental work is reported to demonstrate the efficiency of the proposed solution in addressing the overlapping information growth. The experimental work utilizes extensive user-centric specialized domain topics. This work employs personalized and collaborative multi learning agents and ontology techniques thereby enriching the queries and domains of the user. Therefore, experiments and results have shown that building specialized ontology domains, pertinent to the information seekers' needs, are more precise and efficient compared to other information retrieval applications and existing search engines. Electrical and Computer Engineering
7	Multi-Agent User-Centric Specialization and Collaboration for Information Retrieval Mooman, Abdelniser January 2012 (has links) The amount of information on the World Wide Web (WWW) is rapidly growing in pace and topic diversity. This has made it increasingly difficult, and often frustrating, for information seekers to retrieve the content they are looking for as information retrieval systems (e.g., search engines) are unable to decipher the relevance of the retrieved information as it pertains to the information they are searching for. This issue can be decomposed into two aspects: 1) variability of information relevance as it pertains to an information seeker. In other words, different information seekers may enter the same search text, or keywords, but expect completely different results. It is therefore, imperative that information retrieval systems possess an ability to incorporate a model of the information seeker in order to estimate the relevance and context of use of information before presenting results. Of course, in this context, by a model we mean the capture of trends in the information seeker's search behaviour. This is what many researchers refer to as the personalized search. 2) Information diversity. Information available on the World Wide Web today spans multitudes of inherently overlapping topics, and it is difficult for any information retrieval system to decide effectively on the relevance of the information retrieved in response to an information seeker's query. For example, the information seeker who wishes to use WWW to learn about a cure for a certain illness would receive a more relevant answer if the search engine was optimized into such domains of topics. This is what is being referred to in the WWW nomenclature as a 'specialized search'. This thesis maintains that the information seeker's search is not intended to be completely random and therefore tends to portray itself as consistent patterns of behaviour. Nonetheless, this behaviour, despite being consistent, can be quite complex to capture. To accomplish this goal the thesis proposes a Multi-Agent Personalized Information Retrieval with Specialization Ontology (MAPIRSO). MAPIRSO offers a complete learning framework that is able to model the end user's search behaviour and interests and to organize information into categorized domains so as to ensure maximum relevance of its responses as they pertain to the end user queries. Specialization and personalization are accomplished using a group of collaborative agents. Each agent employs a Reinforcement Learning (RL) strategy to capture end user's behaviour and interests. Reinforcement learning allows the agents to evolve their knowledge of the end user behaviour and interests as they function to serve him or her. Furthermore, REL allows each agent to adapt to changes in an end user's behaviour and interests. Specialization is the process by which new information domains are created based on existing information topics, allowing new kinds of content to be built exclusively for information seekers. One of the key characteristics of specialization domains is the seeker centric - which allows intelligent agents to create new information based on the information seekers' feedback and their behaviours. Specialized domains are created by intelligent agents that collect information from a specific domain topic. The task of these specialized agents is to map the user's query to a repository of specific domains in order to present users with relevant information. As a result, mapping users' queries to only relevant information is one of the fundamental challenges in Artificial Intelligent (AI) and machine learning research. Our approach employs intelligent cooperative agents that specialize in building personalized ontology information domains that pertain to each information seeker's specific needs. Specializing and categorizing information into unique domains is one of the challenge areas that have been addressed and various proposed solutions were evaluated and adopted to address growing information. However, categorizing information into unique domains does not satisfy each individualized information seeker. Information seekers might search for similar topics, but each would have different interests. For example, medical information of a specific medical domain has different importance to both the doctor and patients. The thesis presents a novel solution that will resolve the growing and diverse information by building seeker centric specialized information domains that are personalized through the information seekers' feedback and behaviours. To address this challenge, the research examines the fundamental components that constitute the specialized agent: an intelligent machine learning system, user input queries, an intelligent agent, and information resources constructed through specialized domains. Experimental work is reported to demonstrate the efficiency of the proposed solution in addressing the overlapping information growth. The experimental work utilizes extensive user-centric specialized domain topics. This work employs personalized and collaborative multi learning agents and ontology techniques thereby enriching the queries and domains of the user. Therefore, experiments and results have shown that building specialized ontology domains, pertinent to the information seekers' needs, are more precise and efficient compared to other information retrieval applications and existing search engines. Electrical and Computer Engineering
8	Feedback-Driven Data Clustering Hahmann, Martin 28 October 2013 (has links) The acquisition of data and its analysis has become a common yet critical task in many areas of modern economy and research. Unfortunately, the ever-increasing scale of datasets has long outgrown the capacities and abilities humans can muster to extract information from them and gain new knowledge. For this reason, research areas like data mining and knowledge discovery steadily gain importance. The algorithms they provide for the extraction of knowledge are mandatory prerequisites that enable people to analyze large amounts of information. Among the approaches offered by these areas, clustering is one of the most fundamental. By finding groups of similar objects inside the data, it aims to identify meaningful structures that constitute new knowledge. Clustering results are also often used as input for other analysis techniques like classification or forecasting. As clustering extracts new and unknown knowledge, it obviously has no access to any form of ground truth. For this reason, clustering results have a hypothetical character and must be interpreted with respect to the application domain. This makes clustering very challenging and leads to an extensive and diverse landscape of available algorithms. Most of these are expert tools that are tailored to a single narrowly defined application scenario. Over the years, this specialization has become a major trend that arose to counter the inherent uncertainty of clustering by including as much domain specifics as possible into algorithms. While customized methods often improve result quality, they become more and more complicated to handle and lose versatility. This creates a dilemma especially for amateur users whose numbers are increasing as clustering is applied in more and more domains. While an abundance of tools is offered, guidance is severely lacking and users are left alone with critical tasks like algorithm selection, parameter configuration and the interpretation and adjustment of results. This thesis aims to solve this dilemma by structuring and integrating the necessary steps of clustering into a guided and feedback-driven process. In doing so, users are provided with a default modus operandi for the application of clustering. Two main components constitute the core of said process: the algorithm management and the visual-interactive interface. Algorithm management handles all aspects of actual clustering creation and the involved methods. It employs a modular approach for algorithm description that allows users to understand, design, and compare clustering techniques with the help of building blocks. In addition, algorithm management offers facilities for the integration of multiple clusterings of the same dataset into an improved solution. New approaches based on ensemble clustering not only allow the utilization of different clustering techniques, but also ease their application by acting as an abstraction layer that unifies individual parameters. Finally, this component provides a multi-level interface that structures all available control options and provides the docking points for user interaction. The visual-interactive interface supports users during result interpretation and adjustment. For this, the defining characteristics of a clustering are communicated via a hybrid visualization. In contrast to traditional data-driven visualizations that tend to become overloaded and unusable with increasing volume/dimensionality of data, this novel approach communicates the abstract aspects of cluster composition and relations between clusters. This aspect orientation allows the use of easy-to-understand visual components and makes the visualization immune to scale related effects of the underlying data. This visual communication is attuned to a compact and universally valid set of high-level feedback that allows the modification of clustering results. Instead of technical parameters that indirectly cause changes in the whole clustering by influencing its creation process, users can employ simple commands like merge or split to directly adjust clusters. The orchestrated cooperation of these two main components creates a modus operandi, in which clusterings are no longer created and disposed as a whole until a satisfying result is obtained. Instead, users apply the feedback-driven process to iteratively refine an initial solution. Performance and usability of the proposed approach were evaluated with a user study. Its results show that the feedback-driven process enabled amateur users to easily create satisfying clustering results even from different and not optimal starting situations. info:eu-repo/classification/ddc/004 ddc:004

Search results