Global ETD Search

31	A robust & reliable Data-driven prognostics approach based on extreme learning machine and fuzzy clustering. Javed, Kamran 09 April 2014 (has links) (PDF) Le Pronostic et l'étude de l'état de santé (en anglais Prognostics and Health Management (PHM)) vise à étendre le cycle de vie d'un actif physique, tout en réduisant les coûts d'exploitation et de maintenance. Pour cette raison, le pronostic est considéré comme un processus clé avec des capacités de prédictions. En effet, des estimations précises de la durée de vie avant défaillance d'un équipement, Remaining Useful Life (RUL), permettent de mieux définir un plan d'actions visant à accroître la sécurité, réduire les temps d'arrêt, assurer l'achèvement de la mission et l'efficacité de la production. Des études récentes montrent que les approches guidées par les données sont de plus en plus appliquées pour le pronostic de défaillance. Elles peuvent être considérées comme des modèles de type " boite noire " pour l'étude du comportement du système directement à partir des données de surveillance d'état, pour définir l'état actuel du system et prédire la progression future de défauts. Cependant, l'approximation du comportement des machines critiques est une tâche difficile qui peut entraîner des mauvais pronostics. Pour la compréhension de la modélisation de pronostic guidé par les données, on considère les points suivants. 1) Comment traiter les données brutes de surveillance pour obtenir des caractéristiques appropriées reflétant l'évolution de la dégradation ? 2) Comment distinguer les états de dégradation et définir des critères de défaillance (qui peuvent varier d'un cas à un autre)? 3) Comment être sûr que les modèles définis seront assez robustes pour montrer une performance stable avec des entrées incertaines s'écartant des expériences acquises, et seront suffisamment fiables pour intégrer des données inconnues (c'est à dire les conditions de fonctionnement, les variations de l'ingénierie, etc.)? 4) Comment réaliser facilement une intégration sous des contraintes et des exigences industrielles? Ces questions sont des problèmes abordés dans cette thèse. Elles ont conduit à développer une nouvelle approche allant au-delà des limites des méthodes classiques de pronostic guidé par les données. Les principales contributions sont les suivantes. <br>- L'étape de traitement des données est améliorée par l'introduction d'une nouvelle approche d'extraction des caractéristiques à l'aide de fonctions trigonométriques et cumulatives qui sont basées sur trois caractéristiques : la monotonie, la "trendability" et la prévisibilité. L'idée principale de ce développement est de transformer les données brutes en indicateur qui améliorent la précision des prévisions à long terme. <br>- Pour tenir compte de la robustesse, la fiabilité et l'applicabilité, un nouvel algorithme de prédiction est proposé: Summation Wavelet-Extreme Learning Machine (SWELM). Le SW-ELM assure de bonnes performances de prédiction, tout en réduisant le temps d'apprentissage. Un ensemble de SW-ELM est également proposé pour quantifier l'incertitude et améliorer la précision des estimations. <br>- Les performances du pronostic sont également renforcées grâce à la proposition d'un nouvel algorithme d'évaluation de la santé: Subtractive-Maximum Entropy Fuzzy Clustering (S-MEFC). S-MEFC est une approche de classification non supervisée qui utilise l'inférence de l'entropie maximale pour représenter l'incertitude de données multidimensionnelles. Elle peut automatiquement déterminer le nombre d'états, sans intervention humaine. <br>- Le modèle de pronostic final est obtenu en intégrant le SW-ELM et le S-MEFC pour montrer l'évolution de la dégradation de la machine avec des prédictions simultanées et l'estimation d'états discrets. Ce programme permet également de définir dynamiquement les seuils de défaillance et d'estimer le RUL des machines surveillées. Les développements sont validés sur des données réelles à partir de trois plates-formes expérimentales: PRONOSTIA FEMTO-ST (banc d'essai des roulements), CNC SIMTech (Les fraises d'usinage), C-MAPSS NASA (turboréacteurs) et d'autres données de référence. En raison de la nature réaliste de la stratégie d'estimation du RUL proposée, des résultats très prometteurs sont atteints. Toutefois, la perspective principale de ce travail est d'améliorer la fiabilité du modèle de pronostic. Prognostics Data-driven Extreme learning Machine Fuzzy Clustering RUL
32	A Recommendation System Combining Context-awarenes And User Profiling In Mobile Environment Ulucan, Serkan 01 December 2005 (has links) (PDF) Up to now various recommendation systems have been proposed for web based applications such as e-commerce and information retrieval where a large amount of product or information is available. Basically, the task of the recommendation systems in those applications, for example the e-commerce, is to find and recommend the most relevant items to users/customers. In this domain, the most prominent approaches are collaborative filtering and content-based filtering. Sometimes these approaches are called as user profiling as well. In this work, a context-aware recommendation system is proposed for mobile environment, which also can be considered as an extension of those recommendation systems proposed for web-based information retrieval and e-commerce applications. In the web-based information retrieval and e-commerce applications, for example in an online book store (e-commerce), the users&amp / #8217 / actions are independent of their instant context (location, time&amp / #8230 / etc). But as for mobile environment, the users&amp / #8217 / actions are strictly dependent on their instant context. These dependencies give raise to need of filtering items/actions with respect to the users&amp / #8217 / instant context. In this thesis, an approach coupling approaches from two different domains, one is the mobile environment and other is the web, is proposed. Hence, it will be possible to separate whole approach into two phases: context-aware prediction and user profiling. In the first phase, combination of two methods called fuzzy c-means clustering and learning automata will be used to predict the mobile user&amp / #8217 / s motions in context space beforehand. This provides elimination of a large amount of items placed in the context space. In the second phase, hierarchical fuzzy clustering for users profiling will be used to determine the best recommendation among the remaining items.
33	Análise de dados por meio de agrupamento fuzzy semi-supervisionado e mineração de textos / Data analysis using semisupervised fuzzy clustering and text mining Debora Maria Rossi de Medeiros 08 December 2010 (has links) Esta Tese apresenta um conjunto de técnicas propostas com o objetivo de aprimorar processos de Agrupamento de Dados (AD). O principal objetivo é fornecer à comunidade científica um ferramental para uma análise completa de estruturas implícitas em conjuntos de dados, desde a descoberta dessas estruturas, permitindo o emprego de conhecimento prévio sobre os dados, até a análise de seu significado no contexto em que eles estão inseridos. São dois os pontos principais desse ferramental. O primeiro se trata do algoritmo para AD fuzzy semi-supervisionado SSL+P e sua evolução SSL+P, capazes de levar em consideração o conhecimento prévio disponível sobre os dados em duas formas: rótulos e níveis de proximidade de pares de exemplos, aqui denominados Dicas de Conhecimento Prévio (DCPs). Esses algoritmos também permitem que a métrica de distância seja ajustada aos dados e às DCPs. O algoritmo SSL+P também busca estimar o número ideal de clusters para uma determinada base de dados, levando em conta as DCPs disponíveis. Os algoritmos SSL+P e SSL+P* envolvem a minimização de uma função objetivo por meio de um algoritmo de Otimização Baseado em População (OBP). Esta Tese também fornece ferramentas que podem ser utilizadas diretamente neste ponto: as duas versões modificadas do algoritmo Particle Swarm Optimization (PSO), DPSO-1 e DPSO-2 e 4 formas de inicialização de uma população inicial de soluções. O segundo ponto principal do ferramental proposto nesta Tese diz respeito à análise de clusters resultantes de um processo de AD aplicado a uma base de dados de um domínio específico. É proposta uma abordagem baseada em Mineração de Textos (MT) para a busca em informações textuais, disponibilizadas digitalmente e relacionadas com as entidades representadas nos dados. Em seguida, é fornecido ao pesquisador um conjunto de palavras associadas a cada cluster, que podem sugerir informações que ajudem a identificar as relações compartilhadas por exemplos atribuídos ao mesmo cluster / This Thesis presents a whole set of techniques designed to improve the data clustering proccess. The main goal is to provide to the scientific community a tool set for a complete analyses of the implicit structures in datasets, from the identification of these structures, allowing the use of previous knowledge about the data, to the analysis of its meaning in their context. There are two main points involved in that tool set. The first one is the semi-supervised clustering algorithm SSL+P and its upgraded version SSL+P, which are able of take into account the available knowlegdge about de data in two forms: class labels and pairwise proximity levels, both refered here as hints. These algorithms are also capable of adapting the distance metric to the data and the available hints. The SSL+P algorithm searches the ideal number of clusters for a dataset, considering the available hints. Both SSL+P and SSL+P* techniques involve the minimization of an objective function by a Population-based Optimization algorithm (PBO). This Thesis also provides tools that can be directly employed in this area: the two modified versions of the Particle Swarm Optimization algorithm (PSO), DPSO-1 and DPSO-2, and 4 diferent methods for initializing a population of solutions. The second main point of the tool set proposed by this Thesis regards the analysis of clusters resulting from a clustering process applied to a domain specific dataset. A Text Mining based approach is proposed to search for textual information related to the entities represented by the data, available in digital repositories. Next, a set of words associated with each cluster is presented to the researcher, which can suggest information that can support the identification of relations shared by objects assigned to the same cluster Agrupamento fuzzy semi-supervisionado Mineração de textos Otimização baseada em população Population-based optimization Semisupervised fuzzy clustering Text mining
34	Agrupamento de dados fuzzy colaborativo / Collaborative fuzzy clustering Luiz Fernando Sommaggio Coletta 19 May 2011 (has links) Nas últimas décadas, as técnicas de mineração de dados têm desempenhado um importante papel em diversas áreas do conhecimento humano. Mais recentemente, essas ferramentas têm encontrado espaço em um novo e complexo domínio, nbo qual os dados a serem minerados estão fisicamente distribuídos. Nesse domínio, alguns algorithmos específicos para agrupamento de dados podem ser utilizados - em particular, algumas variantes do algoritmo amplamente Fuzzy C-Means (FCM), as quais têm sido investigadas sob o nome de agrupamento fuzzy colaborativo. Com o objetivo de superar algumas das limitações encontradas em dois desses algoritmos, cinco novos algoritmos foram desenvolvidos nesse trabalho. Esses algoritmos foram estudados em dois cenários específicos de aplicação que levam em conta duas suposições sobre os dados (i.e., se os dados são de uma mesma npopulação ou de diferentes populações). Na prática, tais suposições e a dificuldade em se definir alguns dos parâmetros (que possam ser requeridos), podemn orientar a escolha feita pelo usuário entre os algoitmos diponíveis. Nesse sentido, exemplos ilustrativos destacam as diferenças de desempenho entre os algoritmos estudados e desenvolvidos, permitindo derivar algumas conclusões que podem ser úteis ao aplicar agrupamento fuzzy colaborativo na prática. Análises de complexidade de tempo, espaço, e comunicação também foram realizadas / Data mining techniques have played in important role in several areas of human kwnowledge. More recently, these techniques have found space in a new and complex setting in which the data to be mined are physically distributed. In this setting algorithms for data clustering can be used, such as some variants of the widely used Fuzzy C-Means (FCM) algorithm that support clustering data ditributed across different sites. Those methods have been studied under different names, like collaborative and parallel fuzzy clustring. In this study, we offer some augmentation of the two FCM-based clustering algorithms used to cluster distributed data by arriving at some constructive ways of determining essential parameters of the algorithms (including the number of clusters) and forming a set systematically structured guidelines as to a selection of the specific algorithm dependeing upon a nature of the data environment and the assumption being made about the number of clusters. A thorough complexity analysis including space, time, and communication aspects is reported. A series of detailed numeric experiments is used to illustrate the main ideas discussed in the study Descoberta de conhecimento distribuído Índices de validade Distributed knowledge discovery Validity indices
35	Aplicação de modelos de estimação de fitness em algoritmos geneticos / Fitness estimation models applied to genetic algorithms Mota Filho, Francisco Osvaldo Mendes 21 December 2005 (has links) Orientador: Fernando Antonio Campos Gomide / Dissertação (mestrado) - Universidade Estadual de Campinas, Faculdade de Engenharia Eletrica e de Computação / Made available in DSpace on 2018-08-05T20:02:48Z (GMT). No. of bitstreams: 1 MotaFilho_FranciscoOsvaldoMendes_M.pdf: 2700152 bytes, checksum: 3ab58e91f1a3839dae9d39e47d33ff50 (MD5) Previous issue date: 2005 / Resumo: Para obter uma solução satisfatória, algoritmos genéticos avaliam, em geral, um número grande de indivíduos durante o processo evolutivo. É comum, em aplicações práticas, encontrar funções de avaliação computacionalmente complexas e caras. Porém, nesses casos, o tempo é um fator determinante no desempenho de algoritmos genéticos. Dessa forma, os algoritmos genéticos devem encontrar soluções adequadas em curto intervalo de tempo. Uma alternativa promissora para contornar os custos computacionais referentes à função de avaliação considera o fato de que pode ser mais atrativo avaliar diretamente somente indivíduos selecionados e estimar os fitness dos restantes do que avaliar diretamente toda a população. Este trabalho propõe o uso de modelos de estimação de fitness em algoritmos genéticos. Especificamente, são sugeridos modelos de estimação baseados em agrupamento nebuloso supervisionado (Fuzzy C-Means) e não supervisionado (Aprendizagem Participativa). O objetivo é aproximar as funções de avaliação por meio de modelos de estimação de fitness, sem afetar significativamente a qualidade das soluções. Inicialmente, os modelos de estimação propostos são comparados e analisados experimentalmente com alternativas sugeri das por outros autores, utilizando, para isso, problemas de otimização considerados na literatura de algoritmos genéticos. A seguir, os modelos de estimação de fitness são aplicados em um problema real de engenharia, o planejamento de circulação de trens em ferrovias. Este é um caso típico onde o desempenho de cada planejamento exige um tempo significativo. A eficiência dos modelos propostos é verificada e comprovada experimentalmente comparando com os resultados, em instâncias mais simples, fornecidos por modelos de programação matemática e, em instâncias complexas, fornecidos pelo algoritmo genético clássico / Abstract: Genetic algorithms usually need a large number of fitness evaluations before a satisfying result can be obtained. In many real-world applications, fitness evaluation may be computationally complex and costly. In these cases, time is an essential subject in performance analysis of genetic algorithms. Therefore, genetic algorithms should provide good solutions in a short period of time. A promising approach to alleviate the computational cost of evaluations considers the fact that sometimes it is better to evaluate only selected individuals and estimate the fitness of the remaining individuals instead of evaluate a whole population. This work suggests the application of fitness estimation models in genetic algorithms. More specifically, it deals with estimation models based on supervised fuzzy clustering (Fuzzy C-Means) and unsupervised fuzzy clustering (Participatory Learning). The goal is to approximate the evaluation functions through the use of fitness estimation models, without significantly affect the quality of solutions. Initially, the fitness estimation models are compared and analyzed experimentally with other models already proposed in the literature. Their performance are evaluated using benchmark optimization problems found in the genetic algorithms literature. Next, the fitness estimation models are used to solve a real-world engineering problem, namely the train scheduling in a freight rail line. This is a typical case where the performance measure of each schedule demands a considerable amount of time. Once again, the performance of the fitness estimation models are evaluated experimentally, comparing their results with the results provided, for simple instances, by linear programming models and, for complex instances, by the classic genetic algorithm / Mestrado / Engenharia de Computação / Mestre em Engenharia Elétrica Algoritmos genéticos Teoria da aproximação Conjuntos fuzzy Engenharia ferroviária - Planejamento Genetic algorithms Approximation models Fuzzy clustering Train scheduling
36	A Method for Membership Card Generation Based on Clustering and Optimization Models in A Hypermarket Xiaojun, Chen, Bhattrai, Premlal January 2011 (has links) Context: Data mining as a technique is used to find interesting and valuable knowledge from huge amount of stored data within databases or data warehouses. It encompasses classification, clustering, association rule learning, etc., whose goals are to improve commercial decisions and behaviors in organizations. Amongst these, hierarchical clustering method is commonly used in data selection preprocessing step for customer segmentation in business enterprises. However, this method could not treat with the overlapped or diverse clusters very well. Thus, we attempt to combine clustering and optimization into an integrated and sequential approach that can substantially be employed for segmenting customers and subsequent membership cards generation. Clustering methods is used to segment customers into groups while optimization aids in generating the required membership cards. Objectives: Our master thesis project aims to develop a methodological approach for customer segmentation based on their characteristics in order to define membership cards based on mathematical optimization model in a hypermarket. Methods: In this thesis, literature review of articles was conducted using five reputed databases: IEEE, Google Scholar, Science Direct, Springer and Engineering Village. This was done to have a background study and to gain knowledge about the current research in the field of clustering and optimization based method for membership card generating in a hypermarket. Further, we also employed video interviews as research methodologies and a proof-of-concept implementation for our solution. Interviews allowed us to collect raw data from the hypermarket while testing the data produces preliminary results. This was important because the data could be regarded as a guideline to evaluate the performance of customer segmentation and generating membership cards. Results: We built clustering and optimization models as a two-step sequential method. In the first step, the clustering model was used to segment customers into different clusters. In the second step, our optimization model was utilized to produce different types of membership cards. Besides, we tested a dataset consisting of 100 customer records consequently obtaining five clusters and five types of membership cards respectively. Conclusions: This research provides a basis for customer segmentation and generating membership cards in a hypermarket by way of data mining techniques and optimization. Thus, through our research, an integrated and sequential approach to clustering and optimization can suitably be used for customer segmentation and membership card generation respectively. Data mining Hierarchical clustering Fuzzy clustering Optimization model Membership card Computer Sciences Datavetenskap (datalogi) Software Engineering Programvaruteknik
37	[pt] AGRUPAMENTO FUZZY APLICADO À INTEGRAÇÃO DE DADOS MULTI-ÔMICOS / [en] FUZZY CLUSTERING APPLIED TO MULTI-OMICS DATA SARAH HANNAH LUCIUS LACERDA DE GOES TELLES CARVALHO ALVES 05 October 2021 (has links) [pt] Os avanços nas tecnologias de obtenção de dados multi-ômicos têm disponibilizado diferentes níveis de informação molecular que aumentam progressivamente em volume e variedade. Neste estudo, propõem-se uma metodologia de integração de dados clínicos e multi-ômicos, com o objetivo de identificar subtipos de câncer por agrupamento fuzzy, representando assim as gradações entre os diferentes perfis moleculares. Uma melhor caracterização de tumores em subtipos moleculares pode contribuir para uma medicina mais personalizada e assertiva. Os conjuntos de dados ômicos a serem integrados são definidos utilizando um classificador com classe-alvo definida por resultados da literatura. Na sequência, é realizado o pré-processamento dos conjuntos de dados para reduzir a alta dimensionalidade. Os dados selecionados são integrados e em seguida agrupados. Optou-se pelo algoritmo fuzzy C-means pela sua capacidade de considerar a possibilidade dos pacientes terem características de diferentes grupos, o que não é possível com métodos clássicos de agrupamento. Como estudo de caso, utilizou-se dados de câncer colorretal (CCR). O CCR tem a quarta maior incidência na população mundial e a terceira maior no Brasil. Foram extraídos dados de metilação, expressão de miRNA e mRNA do portal do projeto The Cancer Genome Atlas (TCGA). Observou-se que a adição dos dados de expressão de miRNA e metilação a um classificador de expressão de mRNA da literatura aumentou a acurácia deste em 5 pontos percentuais. Assim, foram usados dados de metilação, expressão de miRNA e mRNA neste trabalho. Os atributos de cada conjunto de dados foram selecionados, obtendo-se redução significativa do número de atributos. A identificação dos grupos foi realizada com o algoritmo fuzzy C-means. A variação dos hiperparâmetros deste algoritmo, número de grupos e parâmetro de fuzzificação, permitiu a escolha da combinação de melhor desempenho. A escolha da melhor configuração considerou o efeito da variação dos parâmetros nas características biológicas, em especial na sobrevida global dos pacientes. Observou-se que o agrupamento gerado permitiu identificar que as amostras consideradas não agrupadas têm características biológicas compartilhadas entre grupos de diferentes prognósticos. Os resultados obtidos com a combinação de dados clínicos e ômicos mostraram-se promissores para melhor predizer o fenótipo. / [en] The advances in technologies for obtaining multi-omic data provide different levels of molecular information that progressively increase in volume and variety. This study proposes a methodology for integrating clinical and multiomic data, which aim is the identification of cancer subtypes using fuzzy clustering algorithm, representing the different degrees between molecular profiles. A better characterization of tumors in molecular subtypes can contribute to a more personalized and assertive medicine. A classifier that uses a target class from literature results indicates which omic data sets should be integrated. Next, data sets are pre-processed to reduce high dimensionality. The selected data is integrated and then clustered. The fuzzy C-means algorithm was chosen due to its ability to consider the shared patients characteristics between different groups. As a case study, colorectal cancer (CRC) data were used. CCR has the fourth highest incidence in the world population and the third highest in Brazil. Methylation, miRNA and mRNA expression data were extracted from The Cancer Genome Atlas (TCGA) project portal. It was observed that the addition of miRNA expression and methylation data to a literature mRNA expression classifier increased its accuracy by 5 percentage points. Therefore, methylation, miRNA and mRNA expression data were used in this work. The attributes of each data set were pre-selected, obtaining a significant reduction in the number of attributes. Groups were identified using the fuzzy C-means algorithm. The variation of the hyperparameters of this algorithm, number of groups and membership degree, indicated the best performance combination. This choice considered the effect of parameters variation on biological characteristics, especially on the overall survival of patients. Clusters showed that patients considered not grouped had biological characteristics shared between groups of different prognoses. The combination of clinical and omic data to better predict the phenotype revealed promissing results. [pt] SELECAO DE ATRIBUTOS [pt] AGRUPAMENTO FUZZY [pt] INTEGRACAO DE DADOS MULTI-OMICOS [en] FEATURE SELECTION [en] FUZZY CLUSTERING [en] MULTI-OMIC DATA INTEGRATION
38	Multitemporal mapping of burned areas in mixed landscapes in eastern Zambia Malambo, Lonesome 08 December 2014 (has links) Fires occur extensively across Zambia every year, a problem recognized as a major threat to biodiversity. Yet, basic tools for mapping at a spatial and temporal scale that provide useful information for understanding and managing this problem are not available. The objectives of this research were: to develop a method to map the spatio-temporal seasonal fire occurrence using satellite imagery, to develop a technique for estimating missing data in the satellite imagery considering the possibility of change in land cover over time, and to demonstrate applicability of these new tools by analyzing the fine-scale seasonal patterns of landscape fires in eastern Zambia. A new approach for mapping burned areas uses multitemporal image analysis with a fuzzy clustering algorithm to automatically select spectral-temporal signatures that are then used to classify the images to produce the desired spatio-temporal burned area information. Testing with Landsat data (30m resolution) in eastern Zambia showed accuracies in predicting burned areas above 92%. The approach is simple to implement, data driven, and can be automated, which can facilitate quicker production of burned area information. A profile-based approach for filling missing data uses multitemporal imagery and exploits the similarity in land cover temporal profiles and spatial relationships to reliably estimate missing data even in areas with significant changes. Testing with simulated missing data from an 8-image spectral index sequence showed highly correlated (R2 of 0.78-0.92) and precise estimates (deviations 4-7%) compared to actual values. The profile-based approach overcomes the common requirement of gap-filling methods that there is gradual or no change in land cover, and provides accurate gap-filling under conditions of both gradual and abrupt changes. The spatio-temporal progression of landscape burning was evaluated for the 2009 and 2012 fire seasons (June-November) using Landsat data. Results show widespread burning (~ 60%) with most fires occurring late (August-October) in the season. Fire occurrence and burn patch sizes decreased with increasing settlement density and landscape fragmentation reflecting human influences and fuel availability. Small fires (< 5ha) are predominant and were significantly under-detected (>50%) by a global dataset (MODIS Burned Area Product (500m resolution)), underscoring the critical need of higher geometric resolution imagery such as Landsat imagery for mapping such fine-scale fire activity. / Ph. D. Remote sensing Burned area mapping multitemporal analysis Fuzzy clustering Scan line corrector error Landsat gap-filling Fire Zambia
39	Enhancing fuzzy associative rule mining approaches for improving prediction accuracy : integration of fuzzy clustering, apriori and multiple support approaches to develop an associative classification rule base Sowan, Bilal Ibrahim January 2011 (has links) Building an accurate and reliable model for prediction for different application domains, is one of the most significant challenges in knowledge discovery and data mining. This thesis focuses on building and enhancing a generic predictive model for estimating a future value by extracting association rules (knowledge) from a quantitative database. This model is applied to several data sets obtained from different benchmark problems, and the results are evaluated through extensive experimental tests. The thesis presents an incremental development process for the prediction model with three stages. Firstly, a Knowledge Discovery (KD) model is proposed by integrating Fuzzy C-Means (FCM) with Apriori approach to extract Fuzzy Association Rules (FARs) from a database for building a Knowledge Base (KB) to predict a future value. The KD model has been tested with two road-traffic data sets. Secondly, the initial model has been further developed by including a diversification method in order to improve a reliable FARs to find out the best and representative rules. The resulting Diverse Fuzzy Rule Base (DFRB) maintains high quality and diverse FARs offering a more reliable and generic model. The model uses FCM to transform quantitative data into fuzzy ones, while a Multiple Support Apriori (MSapriori) algorithm is adapted to extract the FARs from fuzzy data. The correlation values for these FARs are calculated, and an efficient orientation for filtering FARs is performed as a post-processing method. The FARs diversity is maintained through the clustering of FARs, based on the concept of the sharing function technique used in multi-objectives optimization. The best and the most diverse FARs are obtained as the DFRB to utilise within the Fuzzy Inference System (FIS) for prediction. The third stage of development proposes a hybrid prediction model called Fuzzy Associative Classification Rule Mining (FACRM) model. This model integrates the ii improved Gustafson-Kessel (G-K) algorithm, the proposed Fuzzy Associative Classification Rules (FACR) algorithm and the proposed diversification method. The improved G-K algorithm transforms quantitative data into fuzzy data, while the FACR generate significant rules (Fuzzy Classification Association Rules (FCARs)) by employing the improved multiple support threshold, associative classification and vertical scanning format approaches. These FCARs are then filtered by calculating the correlation value and the distance between them. The advantage of the proposed FACRM model is to build a generalized prediction model, able to deal with different application domains. The validation of the FACRM model is conducted using different benchmark data sets from the University of California, Irvine (UCI) of machine learning and KEEL (Knowledge Extraction based on Evolutionary Learning) repositories, and the results of the proposed FACRM are also compared with other existing prediction models. The experimental results show that the error rate and generalization performance of the proposed model is better in the majority of data sets with respect to the commonly used models. A new method for feature selection entitled Weighting Feature Selection (WFS) is also proposed. The WFS method aims to improve the performance of FACRM model. The prediction performance is improved by minimizing the prediction error and reducing the number of generated rules. The prediction results of FACRM by employing WFS have been compared with that of FACRM and Stepwise Regression (SR) models for different data sets. The performance analysis and comparative study show that the proposed prediction model provides an effective approach that can be used within a decision support system. 502.85
40	Development of Partially Supervised Kernel-based Proximity Clustering Frameworks and Their Applications Graves, Daniel 06 1900 (has links) The focus of this study is the development and evaluation of a new partially supervised learning framework. This framework belongs to an emerging field in machine learning that augments unsupervised learning processes with some elements of supervision. It is based on proximity fuzzy clustering, where an active learning process is designed to query for the domain knowledge required in the supervision. Furthermore, the framework is extended to the parametric optimization of the kernel function in the proximity fuzzy clustering algorithm, where the goal is to achieve interesting non-spherical cluster structures through a non-linear mapping. It is demonstrated that the performance of kernel-based clustering is sensitive to the selection of these kernel parameters. Proximity hints procured from domain knowledge are exploited in the partially supervised framework. The theoretic developments with proximity fuzzy clustering are evaluated in several interesting and practical applications. One such problem is the clustering of a set of graphs based on their structural and semantic similarity. The segmentation of music is a second problem for proximity fuzzy clustering, where the aim is to determine the points in time, i.e. boundaries, of significant structural changes in the music. Finally, a time series prediction problem using a fuzzy rule-based system is established and evaluated. The antecedents of the rules are constructed by clustering the time series using proximity information in order to localize the behavior of the rule consequents in the architecture. Evaluation of these efforts on both synthetic and real-world data demonstrate that proximity fuzzy clustering is well suited for a variety of problems. / Digital Signals and Image Processing Partially supervised learning Fuzzy clustering Proximity hints Kernel-based clustering Active learning Multi-proximity clustering Time series analysis Time series clustering Structural musical segmentation Graph clustering

Search results