151 |
Discovery and adaptation of process viewsMotahari Nezhad, Hamid Reza, Computer Science & Engineering, Faculty of Engineering, UNSW January 2008 (has links)
Business process analysis and integration are key endeavours for today's enterprises. Recently, Web services have been widely adopted for the implementation and integration of business processes within and across enterprises. In this dissertation, we investigate the problem of enabling the analysis of service interactions, in today's enterprises, in the context of business process executions, and that of service integration. Our study shows that only fraction of interactions in the enterprise are supported by process-aware systems. However, enabling above-mentioned analyses requires: (i) a model of the underlying business process to be used as a reference for the analysis, and (ii) the ability to correlate events generated during service interactions into process instances. We refer to a process model and the corresponding process instances as a "process view". We propose the concept of process space to refer to all process related information sources in the enterprise, over which various process views are defined. We propose the design and development of a system called "process space discovery system" (PSDS) for discovering process views in a process space. We introduce novel approaches for the correlation of events into process instances, focusing on the public processes of Web services (business protocols), and also for the discovery of the business protocol models from the process instances of a process view. Analysis of service integration approaches shows that while standardisation in Web services simplifies the integration in the communication level, at the higher levels of abstractions (e.g., services interfaces and protocol models) services are still open to heterogeneities. We characterise the mismatches between service interfaces and protocol specifications and introduce "mismatch patterns" to represent them. A mismatch pattern also includes an adapter template that aims at the resolution of the captured mismatch. We also propose semi-automated approaches for identifying the mismatches between interface and protocol specifications of two services. The proposed approaches have been implemented in prototype tools, and experimentally validated on synthetic and real-world datasets. The discovered process views, using PSDS, can be used to perform various analyses in an enterprise, and the proposed adaptation approach facilitates the adoption of Web services in business process integration.
|
152 |
Análise exploratória de dados espaciais aplicada a produtividade de milho no estado do Paraná / Exploratory analysis of spatial data applied to corn yield in the state of ParanáSeffrin, Rodolfo 20 April 2017 (has links)
A cultura do milho é uma das atividades agrícolas mais importantes para a economia no Brasil e a utilização de modelos estatísticos pode auxiliar a tomada de decisão neste setor produtivo. O presente estudo visou identificar áreas com correlação e autocorrelação espacial para a produtividade de milho e suas variáveis preditoras (temperatura média, precipitação pluvial, radiação solar, potencialidade agrícola do solo e altitude), e também, verificar o modelo de regressão espacial mais adequado para a explicação da cultura. O estudo foi realizado utilizando dados de municípios do estado do Paraná referente a safras de verão dos anos agrícolas de 2011/2012, 2012/2013 e 2013/2014. Os softwares utilizados para a análise estatística e geração dos mapas temáticos foram o ArcMap 9.3 e GeoDa 1.6.7. A identificação da dependência espacial entre as variáveis foi realizada por meio do índice de Moran Global (Univariado e Bivariado) e o índice local de associação espacial (LISA), concluindo-se que para todos os anos e critérios de vizinhança utilizados, houve autocorrelação espacial significativa ao nível de 1% para todas as variáveis. Verificou-se ainda que a temperatura média, precipitação e altitude, estão correlacionadas significativamente (P-value<5%) com a produtividade do milho em todos os anos e critérios estudados. As variáveis: radiação solar e potencialidade agrícola do solo não apresentaram correlação significativa para alguns dos anos (2012/2013) e matrizes de vizinhança (contingência queen e vizinho mais próximo). Para determinar o modelo de regressão mais apropriado para a estimativa da produtividade de milho, foi adotado o diagnóstico estatístico do modelo de regressão OLS - Ordinary Least Square, que verifica se é necessário aplicar algum modelo de regressão espacial para explicação dos dados. Para todos anos agrícolas foi recomendado a utilização do modelo de regressão espacial SAR – Spatial Lag Model, sendo que apenas para o ano agrícola 2013/2014 pode ser recomendado o modelo Spatial Error Model (CAR). A regressão espacial (SAR e CAR) adotada para a estimativa da produtividade de milho em diferentes anos, obteve melhores resultados quando comparada com os resultados da regressão que não incorpora a autocorrelação espacial dos dados (OLS). O coeficiente de determinação R², os critérios de informação bayesiano (BIC) e o máximo valor do logaritmo da função verossimilhança (Log-likelihood), apresentou melhora significativa na estimação da produtividade do milho quando utilizado SAR e CAR. / Corn cultivation is one of the most important agricultural activities for the Brazilian economy and the use of statistical models can help decision making in this productive sector. The present study aimed to identify areas with correlation and spatial autocorrelation for maize productivity and its predictive variables (mean temperature, rainfall, solar radiation, soil potential and altitude), and to verify the spatial regression model most appropriate for The explanation of culture. The study was carried out using data from municipalities in the state of Paraná referring to summer crops of the 2011/2012, 2012/2013 and 2013/2014 agricultural years. The software used for the statistical analysis and generation of thematic maps were ArcMap 9.3 and GeoDa 1.6.7. The identification of spatial dependence among variables was performed using the Moran Global Index (Univariate and Bivariate) and the local spatial association index (LISA). It was concluded that for all the years and neighborhood criteria used, there was spatial autocorrelation Significant at the 1% level for all variables. It was also verified that the average temperature, precipitation and altitude, are correlated significantly (P-value <5%) with corn yield in all years and criteria studied. The variables: solar radiation and soil agricultural potential did not present significant correlation for some of the years (2012/2013) and neighborhood matrices (queen contingency and nearest neighbor). To determine the most appropriate regression model for estimating maize productivity, the OLS - Ordinary Least Square regression model was used to verify if it is necessary to apply some spatial regression model to explain the data. For all agricultural years it was recommended to use the spatial regression model Spatial Lag Model (SLM), and only for the agricultural year 2013/2014 can the Spatial Error Model (SEM) be recommended. The spatial regression (SLM and SEM) adopted for estimating maize productivity in different years yielded better results when compared with regression results that did not incorporate spatial data autocorrelation (OLS). The coefficient of determination R², the Bayesian information criteria (BIC) and the maximum log-likelihood value, showed a significant improvement in corn productivity estimation when using SLM and SEM.
|
153 |
Análise exploratória de dados espaciais aplicada a produtividade de milho no estado do Paraná / Exploratory analysis of spatial data applied to corn yield in the state of ParanáSeffrin, Rodolfo 20 April 2017 (has links)
A cultura do milho é uma das atividades agrícolas mais importantes para a economia no Brasil e a utilização de modelos estatísticos pode auxiliar a tomada de decisão neste setor produtivo. O presente estudo visou identificar áreas com correlação e autocorrelação espacial para a produtividade de milho e suas variáveis preditoras (temperatura média, precipitação pluvial, radiação solar, potencialidade agrícola do solo e altitude), e também, verificar o modelo de regressão espacial mais adequado para a explicação da cultura. O estudo foi realizado utilizando dados de municípios do estado do Paraná referente a safras de verão dos anos agrícolas de 2011/2012, 2012/2013 e 2013/2014. Os softwares utilizados para a análise estatística e geração dos mapas temáticos foram o ArcMap 9.3 e GeoDa 1.6.7. A identificação da dependência espacial entre as variáveis foi realizada por meio do índice de Moran Global (Univariado e Bivariado) e o índice local de associação espacial (LISA), concluindo-se que para todos os anos e critérios de vizinhança utilizados, houve autocorrelação espacial significativa ao nível de 1% para todas as variáveis. Verificou-se ainda que a temperatura média, precipitação e altitude, estão correlacionadas significativamente (P-value<5%) com a produtividade do milho em todos os anos e critérios estudados. As variáveis: radiação solar e potencialidade agrícola do solo não apresentaram correlação significativa para alguns dos anos (2012/2013) e matrizes de vizinhança (contingência queen e vizinho mais próximo). Para determinar o modelo de regressão mais apropriado para a estimativa da produtividade de milho, foi adotado o diagnóstico estatístico do modelo de regressão OLS - Ordinary Least Square, que verifica se é necessário aplicar algum modelo de regressão espacial para explicação dos dados. Para todos anos agrícolas foi recomendado a utilização do modelo de regressão espacial SAR – Spatial Lag Model, sendo que apenas para o ano agrícola 2013/2014 pode ser recomendado o modelo Spatial Error Model (CAR). A regressão espacial (SAR e CAR) adotada para a estimativa da produtividade de milho em diferentes anos, obteve melhores resultados quando comparada com os resultados da regressão que não incorpora a autocorrelação espacial dos dados (OLS). O coeficiente de determinação R², os critérios de informação bayesiano (BIC) e o máximo valor do logaritmo da função verossimilhança (Log-likelihood), apresentou melhora significativa na estimação da produtividade do milho quando utilizado SAR e CAR. / Corn cultivation is one of the most important agricultural activities for the Brazilian economy and the use of statistical models can help decision making in this productive sector. The present study aimed to identify areas with correlation and spatial autocorrelation for maize productivity and its predictive variables (mean temperature, rainfall, solar radiation, soil potential and altitude), and to verify the spatial regression model most appropriate for The explanation of culture. The study was carried out using data from municipalities in the state of Paraná referring to summer crops of the 2011/2012, 2012/2013 and 2013/2014 agricultural years. The software used for the statistical analysis and generation of thematic maps were ArcMap 9.3 and GeoDa 1.6.7. The identification of spatial dependence among variables was performed using the Moran Global Index (Univariate and Bivariate) and the local spatial association index (LISA). It was concluded that for all the years and neighborhood criteria used, there was spatial autocorrelation Significant at the 1% level for all variables. It was also verified that the average temperature, precipitation and altitude, are correlated significantly (P-value <5%) with corn yield in all years and criteria studied. The variables: solar radiation and soil agricultural potential did not present significant correlation for some of the years (2012/2013) and neighborhood matrices (queen contingency and nearest neighbor). To determine the most appropriate regression model for estimating maize productivity, the OLS - Ordinary Least Square regression model was used to verify if it is necessary to apply some spatial regression model to explain the data. For all agricultural years it was recommended to use the spatial regression model Spatial Lag Model (SLM), and only for the agricultural year 2013/2014 can the Spatial Error Model (SEM) be recommended. The spatial regression (SLM and SEM) adopted for estimating maize productivity in different years yielded better results when compared with regression results that did not incorporate spatial data autocorrelation (OLS). The coefficient of determination R², the Bayesian information criteria (BIC) and the maximum log-likelihood value, showed a significant improvement in corn productivity estimation when using SLM and SEM.
|
154 |
Canonical correlation analysis of aggravated robbery and poverty in Limpopo ProvinceRwizi, Tandanai 05 1900 (has links)
The study was aimed at exploring the relationship between poverty and aggravated
robbery in Limpopo Province. Sampled secondary data of aggravated robbery of-
fenders, obtained from the South African Police (SAPS), Polokwane, was used in the
analysis. From empirical researches on poverty and crime, there are some deductions
that vulnerability to crime is increased by poverty. Poverty set was categorised by
gender, employment status, marital status, race, age and educational attainment.
Variables for aggravated robbery were house robbery, bank robbery, street/common
robbery, carjacking, truck hijacking, cash-in-transit and business robbery. Canonical
correlation analysis was used to make some inferences about the relationship of these
two sets. The results revealed a signi cant positive correlation of 0.219(p-value =
0.025) between poverty and aggravated robbery at ve per cent signi cance level. Of
the thirteen variables entered into the poverty-aggravated model, ve emerged as sta-
tistically signi cant. These were gender, marital status, employment status, common robbery and business robbery. / Mathematical Sciences / M. Sc. (Statistics)
|
155 |
Towards on-line domain-independent big data learning : novel theories and applicationsMalik, Zeeshan January 2015 (has links)
Feature extraction is an extremely important pre-processing step to pattern recognition, and machine learning problems. This thesis highlights how one can best extract features from the data in an exhaustively online and purely adaptive manner. The solution to this problem is given for both labeled and unlabeled datasets, by presenting a number of novel on-line learning approaches. Specifically, the differential equation method for solving the generalized eigenvalue problem is used to derive a number of novel machine learning and feature extraction algorithms. The incremental eigen-solution method is used to derive a novel incremental extension of linear discriminant analysis (LDA). Further the proposed incremental version is combined with extreme learning machine (ELM) in which the ELM is used as a preprocessor before learning. In this first key contribution, the dynamic random expansion characteristic of ELM is combined with the proposed incremental LDA technique, and shown to offer a significant improvement in maximizing the discrimination between points in two different classes, while minimizing the distance within each class, in comparison with other standard state-of-the-art incremental and batch techniques. In the second contribution, the differential equation method for solving the generalized eigenvalue problem is used to derive a novel state-of-the-art purely incremental version of slow feature analysis (SLA) algorithm, termed the generalized eigenvalue based slow feature analysis (GENEIGSFA) technique. Further the time series expansion of echo state network (ESN) and radial basis functions (EBF) are used as a pre-processor before learning. In addition, the higher order derivatives are used as a smoothing constraint in the output signal. Finally, an online extension of the generalized eigenvalue problem, derived from James Stone’s criterion, is tested, evaluated and compared with the standard batch version of the slow feature analysis technique, to demonstrate its comparative effectiveness. In the third contribution, light-weight extensions of the statistical technique known as canonical correlation analysis (CCA) for both twinned and multiple data streams, are derived by using the same existing method of solving the generalized eigenvalue problem. Further the proposed method is enhanced by maximizing the covariance between data streams while simultaneously maximizing the rate of change of variances within each data stream. A recurrent set of connections used by ESN are used as a pre-processor between the inputs and the canonical projections in order to capture shared temporal information in two or more data streams. A solution to the problem of identifying a low dimensional manifold on a high dimensional dataspace is then presented in an incremental and adaptive manner. Finally, an online locally optimized extension of Laplacian Eigenmaps is derived termed the generalized incremental laplacian eigenmaps technique (GENILE). Apart from exploiting the benefit of the incremental nature of the proposed manifold based dimensionality reduction technique, most of the time the projections produced by this method are shown to produce a better classification accuracy in comparison with standard batch versions of these techniques - on both artificial and real datasets.
|
156 |
A framework for conducting mechanistic based reliability assessments of components operating in complex systemsWallace, Jon Michael 02 December 2003 (has links)
Reliability prediction of components operating in complex systems has historically been conducted in a statistically isolated manner. Current physics-based, i.e. mechanistic, component reliability approaches focus more on component-specific attributes and mathematical algorithms and not enough on the influence of the system. The result is that significant error can be introduced into the component reliability assessment process.
The objective of this study is the development of a framework that infuses the influence of the system into the process of conducting mechanistic-based component reliability assessments. The formulated framework consists of six primary steps. The first three steps, identification, decomposition, and synthesis, are qualitative in nature and employ system reliability and safety engineering principles for an appropriate starting point for the component reliability assessment.
The most unique steps of the framework are the steps used to quantify the system-driven local parameter space and a subsequent step using this information to guide the reduction of the component parameter space. The local statistical space quantification step is accomplished using two newly developed multivariate probability tools: Multi-Response First Order Second Moment and Taylor-Based Inverse Transformation. Where existing joint probability models require preliminary statistical information of the responses, these models combine statistical information of the input parameters with an efficient sampling of the response analyses to produce the multi-response joint probability distribution.
Parameter space reduction is accomplished using Approximate Canonical Correlation Analysis (ACCA) employed as a multi-response screening technique. The novelty of this approach is that each individual local parameter and even subsets of parameters representing entire contributing analyses can now be rank ordered with respect to their contribution to not just one response, but the entire vector of component responses simultaneously.
The final step of the framework is the actual probabilistic assessment of the component. Variations of this final step are given to allow for the utilization of existing probabilistic methods such as response surface Monte Carlo and Fast Probability Integration.
The framework developed in this study is implemented to conduct the finite-element based reliability prediction of a gas turbine airfoil involving several failure responses. The framework, as implemented resulted in a considerable improvement to the accuracy of the part reliability assessment and an increased statistical understanding of the component failure behavior.
|
157 |
Paintings by numbers : applications of bivariate correlation and descriptive statistics to Russian avant-garde artworkStrugnell, James Paul January 2017 (has links)
In this thesis artwork is defined, through analogy with quantum mechanics, as the conjoining of the nonsimultaneously measurable momentum (waves) of artwork-text (words within the primary sources and exhibition catalogues) with the position (particles) of artwork-objects (artist- productivity/exhibition-quantities). Such a proposition allows for the changes within the artwork of the Russian avant-garde to be charted, as such artwork-objects are juxtaposed with different artwork-texts from 1902 to 2009. The artwork of an initial period from 1902 to 1934 is examined using primary-source artwork-text produced by Russian artists and critics in relation to the contemporaneous production-levels of various types of Russian-avant-garde artwork-objects. The primary sources in this dataset are those reproduced in the artwork-text produced by the 62 exhibitions described below, and those published in John E. Bowlt's 1991 edition of Russian Art of the Avant-Garde: Theory and Criticism. The production of artwork in the latter period from 1935 to 2009 is examined through consecutive exhibitions, and the relationship between the artwork-text produced by these exhibitions and the artwork-objects exhibited at them. The exhibitions examined within this thesis are 62 containing Russian avant-garde artwork, held in Britain from 1935 to 2009. Content analysis, using an indices-and-symptom analytical construct, functions to convert the textual, unstructured data of the artwork-text words to numerical, structured data of recording-unit weighted percentages. Whilst artist-productivity and exhibition-quantities of types of artwork-object convert the individual artwork-objects to structured data. Bivariate correlation, descriptive statistics, graphs and charts are used to define and compare relationships between: The recording units of the artwork-texts; the artist-productivity/ exhibition-quantities of types of artwork-objects; the structured artwork-text data and structured artwork-object data. These various correlations between structured artwork-text data and structured artwork-object data are calculated in relationship to time (Years) to chart the changes within these relationships. The changes within these relationships are synonymous with changes within Russian avant-garde artwork as presented from 1902 to 1934 and within the 62 British exhibitions from 1935 to 2009. Bivariate correlations between structured artwork-texts data and structured artwork-objects data express numerically (quantitatively) the ineffable relationships formed over time by large sets of unstructured data in the continued (re)creation of artwork.
|
158 |
Canonical correlation analysis of aggravated robbery and poverty in Limpopo ProvinceRwizi, Tandanai 05 1900 (has links)
The study was aimed at exploring the relationship between poverty and aggravated
robbery in Limpopo Province. Sampled secondary data of aggravated robbery of-
fenders, obtained from the South African Police (SAPS), Polokwane, was used in the
analysis. From empirical researches on poverty and crime, there are some deductions
that vulnerability to crime is increased by poverty. Poverty set was categorised by
gender, employment status, marital status, race, age and educational attainment.
Variables for aggravated robbery were house robbery, bank robbery, street/common
robbery, carjacking, truck hijacking, cash-in-transit and business robbery. Canonical
correlation analysis was used to make some inferences about the relationship of these
two sets. The results revealed a signi cant positive correlation of 0.219(p-value =
0.025) between poverty and aggravated robbery at ve per cent signi cance level. Of
the thirteen variables entered into the poverty-aggravated model, ve emerged as sta-
tistically signi cant. These were gender, marital status, employment status, common robbery and business robbery. / Mathematical Sciences / M. Sc. (Statistics)
|
159 |
Definição automática de classificadores fuzzy probabilísticos / Automatic design of probabilistic fuzzy classifiersMelo Jr., Luiz Ledo Mota 18 September 2017 (has links)
CNPq / Este trabalho apresenta uma abordagem para a definição automática de bases de regras em Classificadores Fuzzy Probabilísticos (CFPs), um caso particular dos Sistemas Fuzzy Probabilísticos. Como parte integrante deste processo, são utilizados métodos de redução de dimensionalidade como: análise de componentes principais e discriminante de Fisher. Os algoritmos de agrupamento testados para particionar o universo das variáveis de entrada do sistema são Gustafson-Kessel, Supervised Fuzzy Clustering ambos já consolidados na literatura. Adicionalmente, propõe-se um novo algoritmo de agrupamento denominado Gustafson-Kessel com Ponto Focal como parte integrante do projeto automático de CFPs. A capacidade deste novo algoritmo em identificar clusters elipsoidais e não elipsoidais também é avaliada neste trabalho. Em dados altamente correlacionados ou totalmente correlacionados ocorrem problemas na inversão da matriz de covariância fuzzy. Desta forma, um método de regularização é necessário para esta matriz e um novo método está sendo proposto neste trabalho.Nos CFPs considerados, a combinação de antecedentes e consequentes fornece uma base de regras na qual todos os consequentes são possíveis em uma regra, cada um associado a uma medida de probabilidade. Neste trabalho, esta medida de probabilidade é calculada com base no Teorema de Bayes que, a partir de uma função de verossimilhança, atualiza a informação a priori de cada consequente em cada regra. A principal inovação é o cálculo da função de verossimilhança que se baseia no conceito de “região Ideal” de forma a melhor identificar as probabilidades associadas aos consequentes da regra. Os CFPs propostos são comparados com classificadores fuzzy-bayesianos e outros tradicionais na área de aprendizado de máquina considerando conjuntos de dados gerados artificialmente, 30 benchmarks e também dados extraídos diretamente de problemas reais como detecção de falhas em rolamentos de máquinas industriais. Os resultados dos experimentos mostram que os classificadores fuzzy propostos superam, em termos de acurácia, os classificadores fuzzy-bayesianos considerados e alcançam resultados competitivos com classificadores não-fuzzy tradicionais usados na comparação. Os resultados também mostram que o método de regularização proposto é uma alternativa para a técnica de agrupamento Gustafson-Kessel (com ou sem ponto focal) quando se consideram dados com alta correção linear. / This work presents a new approach for the automatic design of Probabilistic Fuzzy Classifiers (PFCs), which are a special case of Probabilistic Fuzzy Systems. As part of the design process we consider methods for reducing the dimensionality like the principal component analysis and the Fisher discriminant. The clustering methods tested for partitioning the universe of input variables are Gustafson-Kessel and Supervised Fuzzy Clustering, both consolidated in the literature. In addition, we propose a new clustering method called Gustafson-Kessel with Focal Point as part of the automatic design of PFCs. We also tested the capacity of this method to deal with ellipsoidal and non-ellipsoidal clusters. Highly correlated data represent a challenge to fuzzy clustering due to the inversion of the fuzzy covariance matrix. Therefore, a regularization method is necessary for this matrix and a new one is proposed in this work. In the proposed PFCs, the combination of antecedents and consequents provides a rule base in which all consequents are possible, each one associated with a probability measure. In this work, the probability is calculated based on the Bayes Theorem by updating, through the likelihood function, a priori information concerning every consequent in each rule. The main innovation is the calculus of the likelihood functions which is based on the “ideal region” concept, aiming to improve the estimation of the probabilities associated with rules’ consequents. The proposed PFCs are compared with fuzzy-bayesian classifiers and other ones traditional in machine learning over artificial generated data, 30 different benchmarks and also on data directly extracted from real world like the problem of detecting bearings fault in industrial machines. Experiments results show that the proposed PFCs outperform, in terms of accuracy, the fuzzy-bayesian approaches and are competitive with the traditional non-fuzzy classifiers used in the comparison. The results also show that the proposed regularization method is an alternative to the Gustafson-Kessel clustering technique (with or without focal point) when using linearly correlated data.
|
160 |
Definição automática de classificadores fuzzy probabilísticos / Automatic design of probabilistic fuzzy classifiersMelo Jr., Luiz Ledo Mota 18 September 2017 (has links)
CNPq / Este trabalho apresenta uma abordagem para a definição automática de bases de regras em Classificadores Fuzzy Probabilísticos (CFPs), um caso particular dos Sistemas Fuzzy Probabilísticos. Como parte integrante deste processo, são utilizados métodos de redução de dimensionalidade como: análise de componentes principais e discriminante de Fisher. Os algoritmos de agrupamento testados para particionar o universo das variáveis de entrada do sistema são Gustafson-Kessel, Supervised Fuzzy Clustering ambos já consolidados na literatura. Adicionalmente, propõe-se um novo algoritmo de agrupamento denominado Gustafson-Kessel com Ponto Focal como parte integrante do projeto automático de CFPs. A capacidade deste novo algoritmo em identificar clusters elipsoidais e não elipsoidais também é avaliada neste trabalho. Em dados altamente correlacionados ou totalmente correlacionados ocorrem problemas na inversão da matriz de covariância fuzzy. Desta forma, um método de regularização é necessário para esta matriz e um novo método está sendo proposto neste trabalho.Nos CFPs considerados, a combinação de antecedentes e consequentes fornece uma base de regras na qual todos os consequentes são possíveis em uma regra, cada um associado a uma medida de probabilidade. Neste trabalho, esta medida de probabilidade é calculada com base no Teorema de Bayes que, a partir de uma função de verossimilhança, atualiza a informação a priori de cada consequente em cada regra. A principal inovação é o cálculo da função de verossimilhança que se baseia no conceito de “região Ideal” de forma a melhor identificar as probabilidades associadas aos consequentes da regra. Os CFPs propostos são comparados com classificadores fuzzy-bayesianos e outros tradicionais na área de aprendizado de máquina considerando conjuntos de dados gerados artificialmente, 30 benchmarks e também dados extraídos diretamente de problemas reais como detecção de falhas em rolamentos de máquinas industriais. Os resultados dos experimentos mostram que os classificadores fuzzy propostos superam, em termos de acurácia, os classificadores fuzzy-bayesianos considerados e alcançam resultados competitivos com classificadores não-fuzzy tradicionais usados na comparação. Os resultados também mostram que o método de regularização proposto é uma alternativa para a técnica de agrupamento Gustafson-Kessel (com ou sem ponto focal) quando se consideram dados com alta correção linear. / This work presents a new approach for the automatic design of Probabilistic Fuzzy Classifiers (PFCs), which are a special case of Probabilistic Fuzzy Systems. As part of the design process we consider methods for reducing the dimensionality like the principal component analysis and the Fisher discriminant. The clustering methods tested for partitioning the universe of input variables are Gustafson-Kessel and Supervised Fuzzy Clustering, both consolidated in the literature. In addition, we propose a new clustering method called Gustafson-Kessel with Focal Point as part of the automatic design of PFCs. We also tested the capacity of this method to deal with ellipsoidal and non-ellipsoidal clusters. Highly correlated data represent a challenge to fuzzy clustering due to the inversion of the fuzzy covariance matrix. Therefore, a regularization method is necessary for this matrix and a new one is proposed in this work. In the proposed PFCs, the combination of antecedents and consequents provides a rule base in which all consequents are possible, each one associated with a probability measure. In this work, the probability is calculated based on the Bayes Theorem by updating, through the likelihood function, a priori information concerning every consequent in each rule. The main innovation is the calculus of the likelihood functions which is based on the “ideal region” concept, aiming to improve the estimation of the probabilities associated with rules’ consequents. The proposed PFCs are compared with fuzzy-bayesian classifiers and other ones traditional in machine learning over artificial generated data, 30 different benchmarks and also on data directly extracted from real world like the problem of detecting bearings fault in industrial machines. Experiments results show that the proposed PFCs outperform, in terms of accuracy, the fuzzy-bayesian approaches and are competitive with the traditional non-fuzzy classifiers used in the comparison. The results also show that the proposed regularization method is an alternative to the Gustafson-Kessel clustering technique (with or without focal point) when using linearly correlated data.
|
Page generated in 0.1188 seconds