Spelling suggestions: "subject:"4cluster validation"" "subject:"bycluster validation""
11 |
AvaliaÃÃo de redes neurais competitivas em tarefas de quantizaÃÃo vetorial:um estudo comparativo / Evaluation of competitive neural networks in tasks of vector quantization (VQ): a comparative studyMagnus Alencar da cruz 06 October 2007 (has links)
nÃo hà / Esta dissertaÃÃo tem como principal meta realizar um estudo comparativo do desempenho de algoritmos de redes neurais competitivas nÃo-supervisionadas em problemas de quantizaÃÃo vetorial (QV) e aplicaÃÃes correlatas, tais como anÃlise de agrupamentos (clustering) e compressÃo de imagens. A motivaÃÃo para tanto parte da percepÃÃo de que hà uma relativa escassez de estudos comparativos sistemÃticos entre algoritmos neurais e nÃo-neurais de anÃlise de agrupamentos na literatura especializada. Um total de sete algoritmos sÃo avaliados, a saber: algoritmo K -mÃdias e as redes WTA, FSCL, SOM, Neural-Gas, FuzzyCL e RPCL. De particular interesse à a seleÃÃo do nÃmero Ãtimo de neurÃnios. NÃo hà um mÃtodo que funcione para todas as situaÃÃes, restando portanto avaliar a influÃncia que cada tipo de mÃtrica exerce sobre algoritmo em estudo. Por exemplo, os algoritmos de QV supracitados sÃo bastante usados em tarefas de clustering. Neste tipo de aplicaÃÃo, a validaÃÃo dos agrupamentos à feita com base em Ãndices que quantificam os graus de compacidade e separabilidade dos agrupamentos encontrados, tais como Ãndice Dunn e Ãndice Davies-Bouldin (DB). Jà em tarefas de compressÃo de imagens, determinado algoritmo de QV à avaliado em funÃÃo da qualidade da informaÃÃo reconstruÃda, daà as mÃtricas mais usadas serem o erro quadrÃtico mÃdio de quantizaÃÃo (EQMQ) ou a relaÃÃo sinal-ruÃdo de pico (PSNR). Empiricamente verificou-se que, enquanto o Ãndice DB favorece arquiteturas com poucos protÃtipos e o Dunn com muitos, as mÃtricas EQMQ e PSNR sempre favorecem nÃmeros ainda maiores. Nenhuma das mÃtricas supracitadas leva em consideraÃÃo o nÃmero de parÃmetros do modelo. Em funÃÃo disso, esta dissertaÃÃo propÃe o uso do critÃrio de informaÃÃo de Akaike (AIC) e o critÃrio do comprimento descritivo mÃnimo (MDL) de Rissanen para selecionar o nÃmero Ãtimo de protÃtipos. Este tipo de mÃtrica mostra-se Ãtil na busca do nÃmero de protÃtipos que satisfaÃa simultaneamente critÃrios opostos, ou seja, critÃrios que buscam o menor erro de reconstruÃÃo a todo custo (MSE e PSNR) e critÃrios que buscam clusters mais compactos e coesos (Ãndices Dunn e DB). Como conseqÃÃncia, o nÃmero de protÃtipos obtidos pelas mÃtricas AIC e MDL à geralmente um valor intermediÃrio, i.e. nem tÃo baixo quanto o sugerido pelos Ãndices Dunn e DB, nem tÃo altos quanto o sugerido pelas mÃtricas MSE e PSNR. Outra conclusÃo importante à que nÃo necessariamente os algoritmos mais sofisticados do ponto de vista da modelagem, tais como as redes SOM e Neural-Gas, sÃo os que apresentam melhores desempenhos em tarefas de clustering e quantizaÃÃo vetorial. Os algoritmos FSCL e FuzzyCL sÃo os que apresentam melhores resultados em tarefas de quantizaÃÃo vetorial, com a rede FSCL apresentando melhor relaÃÃo custo-benefÃcio, em funÃÃo do seu menor custo computacional. Para finalizar, vale ressaltar que qualquer que seja o algoritmo escolhido, se o mesmo tiver seus parÃmetros devidamente ajustados e seus desempenhos devidamente avaliados, as diferenÃas de performance entre os mesmos sÃo desprezÃveis, ficando como critÃrio de desempate o custo computacional. / The main goal of this master thesis was to carry out a comparative study of the performance of algorithms of unsupervised competitive neural networks in problems of vector quantization (VQ) tasks and related applications, such as cluster analysis and image compression. This study is mainly motivated by the relative scarcity of systematic comparisons between neural and nonneural algorithms for VQ in specialized literature. A total of seven algorithms are evaluated, namely: K-means, WTA, FSCL, SOM, Neural-Gas, FuzzyCL and RPCL. Of particular interest is the problem of selecting an adequate number of neurons given a particular vector quantization problem. Since there is no widespread method that works satisfactorily for all applications, the remaining alternative is to evaluate the influence that each type of evaluation metric has on a specific algorithm. For example, the aforementioned vector quantization algorithms are widely used in clustering-related tasks. For this type of application, cluster validation is based on indexes that quantify the degrees of compactness and separability among clusters, such as the Dunn Index and the Davies- Bouldin (DB) Index. In image compression tasks, however, a given vector quantization algorithm is evaluated in terms of the quality of the reconstructed information, so that the most used evaluation metrics are the mean squared quantization error (MSQE) and the peak signal-to-noise ratio (PSNR). This work verifies empirically that, while the indices Dunn and DB or favors architectures with many prototypes (Dunn) or with few prototypes (DB), metrics MSE and PSNR always favor architectures with well bigger amounts. None of the evaluation metrics cited previously takes into account the number of parameters of the model. Thus, this thesis evaluates the feasibility of the use of the Akaikeâs information criterion (AIC) and Rissanenâs minimum description length (MDL) criterion to select the optimal number of prototypes. This type of evaluation metric indeed reveals itself useful in the search of the number of prototypes that simultaneously satisfies conflicting criteria, i.e. those favoring more compact and cohesive clusters (Dunn and DB indices) versus those searching for very low reconstruction errors (MSE and PSNR). Thus, the number of prototypes suggested by AIC and MDL is generally an intermediate value, i.e nor so low as much suggested for the indexes Dunn and DB, nor so high as much suggested one for metric MSE and PSNR. Another important conclusion is that sophisticated models, such as the SOM and Neural- Gas networks, not necessarily have the best performances in clustering and VQ tasks. For example, the algorithms FSCL and FuzzyCL present better results in terms of the the of the reconstructed information, with the FSCL presenting better cost-benefit ratio due to its lower computational cost. As a final remark, it is worth emphasizing that if a given algorithm has its parameters suitably tuned and its performance fairly evaluated, the differences in performance compared to others prototype-based algorithms is minimum, with the coputational cost being used to break ties.
|
12 |
Text mining se zaměřením na shlukovací a fuzzy shlukovací metody / Text mining focused on clustering and fuzzy clustering methodsZubková, Kateřina January 2018 (has links)
This thesis is focused on cluster analysis in the field of text mining and its application to real data. The aim of the thesis is to find suitable categories (clusters) in the transcribed calls recorded in the contact center of Česká pojišťovna a.s. by transferring these textual documents into the vector space using basic text mining methods and the implemented clustering algorithms. From the formal point of view, the thesis contains a description of preprocessing and representation of textual data, a description of several common clustering methods, cluster validation, and the application itself.
|
13 |
Evaluating clustering techniques in financial time seriesMillberg, Johan January 2023 (has links)
This degree project aims to investigate different evaluation strategies for clustering methodsused to cluster multivariate financial time series. Clustering is a type of data mining techniquewith the purpose of partitioning a data set based on similarity to data points in the same cluster,and dissimilarity to data points in other clusters. By clustering the time series of mutual fundreturns, it is possible to help individuals select funds matching their current goals and portfolio. Itis also possible to identify outliers. These outliers could be mutual funds that have not beenclassified accurately by the fund manager, or potentially fraudulent practices. To determine which clustering method is the most appropriate for the current data set it isimportant to be able to evaluate different techniques. Using robust evaluation methods canassist in choosing the parameters to ensure optimal performance. The evaluation techniquesinvestigated are conventional internal validation measures, stability measures, visualizationmethods, and evaluation using domain knowledge about the data. The conventional internalvalidation methods and stability measures were used to perform model selection to find viableclustering method candidates. These results were then evaluated using visualization techniquesas well as qualitative analysis of the result. Conventional internal validation measures testedmight not be appropriate for model selection of the clustering methods, distance metrics, or datasets tested. The results often contradicted one another or suggested trivial clustering solutions,where the number of clusters is either 1 or equal to the number of data points in the data sets.Similarly, a stability validation metric called the stability index typically favored clustering resultscontaining as few clusters as possible. The only method used for model selection thatconsistently suggested clustering algorithms producing nontrivial solutions was the CLOSEscore. The CLOSE score was specifically developed to evaluate clusters of time series bytaking both stability in time and the quality of the clusters into account. We use cluster visualizations to show the clusters. Scatter plots were produced by applyingdifferent methods of dimension reduction to the data, Principal Component Analysis (PCA) andt-Distributed Stochastic Neighbor Embedding (t-SNE). Additionally, we use cluster evolutionplots to display how the clusters evolve as different parts of the time series are used to performthe clustering thus emphasizing the temporal aspect of time series clustering. Finally, the resultsindicate that a manual qualitative analysis of the clustering results is necessary to finely tune thecandidate clustering methods. Performing this analysis highlights flaws of the other validationmethods, as well as allows the user to select the best method out of a few candidates based onthe use case and the reason for performing the clustering.
|
14 |
EMG Signal Decomposition Using Motor Unit Potential Train ValidityParsaei, Hossein 09 1900 (has links)
Electromyographic (EMG) signal decomposition is the process of resolving an EMG signal into its component motor unit potential trains (MUPTs). The extracted MUPTs can aid in the diagnosis of neuromuscular disorders and the study of the neural control of movement, but only if they are valid trains. Before using decomposition results and the motor unit potential (MUP) shape and motor unit (MU) firing pattern information related to each active MU for either clinical or research purposes the fact that the extracted MUPTs are valid needs to be confirmed.
The existing MUPT validation methods are either time consuming or related to operator experience and skill. More importantly, they cannot be executed during automatic decomposition of EMG signals to assist with improving decomposition results. To overcome these issues, in this thesis the possibility of developing automatic MUPT validation algorithms has been explored. Several methods based on a combination of feature extraction techniques, cluster validation methods, supervised classification algorithms, and multiple classifier fusion techniques were developed. The developed methods, in general, use either the MU firing pattern or MUP-shape consistency of a MUPT, or both, to estimate its overall validity.
The performance of the developed systems was evaluated using a variety of MUPTs obtained from the decomposition of several simulated and real intramuscular EMG signals. Based on the results achieved, the methods that use only shape or only firing pattern information had higher generalization error than the systems that use both types of information. For the classifiers that use MU firing pattern information of a MUPT to determine its validity, the accuracy for invalid trains decreases as the number of missed-classification errors in trains increases. Likewise, for the methods that use MUP-shape information of a MUPT to determine its validity, the classification accuracy for invalid trains decreases as the within-train similarity of the invalid trains increase. Of the systems that use both shape and firing pattern information, those that separately estimate MU firing pattern validity and MUP-shape validity and then estimate the overall validity of a train by fusing these two indices using trainable fusion methods performed better than the single classifier scheme that estimates MUPT validity using a single classifier, especially for the real data used. Overall, the multi-classifier constructed using trainable logistic regression to aggregate base classifier outputs had the best performance with overall accuracy of 99.4% and 98.8% for simulated and real data, respectively.
The possibility of formulating an algorithm for automated editing MUPTs contaminated with a high number of false-classification errors (FCEs) during decomposition was also investigated. Ultimately, a robust method was developed for this purpose. Using a supervised classifier and MU firing pattern information provided by each MUPT, the developed algorithm first determines whether a given train is contaminated by a high number of FCEs and needs to be edited. For contaminated MUPTs, the method uses both MU firing pattern and MUP shape information to detect MUPs that were erroneously assigned to the train. Evaluation based on simulated and real MU firing patterns, shows that contaminated MUPTs could be detected with 84% and 81% accuracy for simulated and real data, respectively. For a given contaminated MUPT, the algorithm on average correctly classified around 92.1% of the MUPs of the MUPT.
The effectiveness of using the developed MUPT validation systems and the MUPT editing methods during EMG signal decomposition was investigated by integrating these algorithms into a certainty-based EMG signal decomposition algorithm. Overall, the decomposition accuracy for 32 simulated and 30 real EMG signals was improved by 7.5% (from 86.7% to 94.2%) and 3.4% (from 95.7% to 99.1%), respectively. A significant improvement was also achieved in correctly estimating the number of MUPTs represented in a set of detected MUPs. The simulated and real EMG signals used were comprised of 3–11 and 3–15 MUPTs, respectively.
|
15 |
EMG Signal Decomposition Using Motor Unit Potential Train ValidityParsaei, Hossein 09 1900 (has links)
Electromyographic (EMG) signal decomposition is the process of resolving an EMG signal into its component motor unit potential trains (MUPTs). The extracted MUPTs can aid in the diagnosis of neuromuscular disorders and the study of the neural control of movement, but only if they are valid trains. Before using decomposition results and the motor unit potential (MUP) shape and motor unit (MU) firing pattern information related to each active MU for either clinical or research purposes the fact that the extracted MUPTs are valid needs to be confirmed.
The existing MUPT validation methods are either time consuming or related to operator experience and skill. More importantly, they cannot be executed during automatic decomposition of EMG signals to assist with improving decomposition results. To overcome these issues, in this thesis the possibility of developing automatic MUPT validation algorithms has been explored. Several methods based on a combination of feature extraction techniques, cluster validation methods, supervised classification algorithms, and multiple classifier fusion techniques were developed. The developed methods, in general, use either the MU firing pattern or MUP-shape consistency of a MUPT, or both, to estimate its overall validity.
The performance of the developed systems was evaluated using a variety of MUPTs obtained from the decomposition of several simulated and real intramuscular EMG signals. Based on the results achieved, the methods that use only shape or only firing pattern information had higher generalization error than the systems that use both types of information. For the classifiers that use MU firing pattern information of a MUPT to determine its validity, the accuracy for invalid trains decreases as the number of missed-classification errors in trains increases. Likewise, for the methods that use MUP-shape information of a MUPT to determine its validity, the classification accuracy for invalid trains decreases as the within-train similarity of the invalid trains increase. Of the systems that use both shape and firing pattern information, those that separately estimate MU firing pattern validity and MUP-shape validity and then estimate the overall validity of a train by fusing these two indices using trainable fusion methods performed better than the single classifier scheme that estimates MUPT validity using a single classifier, especially for the real data used. Overall, the multi-classifier constructed using trainable logistic regression to aggregate base classifier outputs had the best performance with overall accuracy of 99.4% and 98.8% for simulated and real data, respectively.
The possibility of formulating an algorithm for automated editing MUPTs contaminated with a high number of false-classification errors (FCEs) during decomposition was also investigated. Ultimately, a robust method was developed for this purpose. Using a supervised classifier and MU firing pattern information provided by each MUPT, the developed algorithm first determines whether a given train is contaminated by a high number of FCEs and needs to be edited. For contaminated MUPTs, the method uses both MU firing pattern and MUP shape information to detect MUPs that were erroneously assigned to the train. Evaluation based on simulated and real MU firing patterns, shows that contaminated MUPTs could be detected with 84% and 81% accuracy for simulated and real data, respectively. For a given contaminated MUPT, the algorithm on average correctly classified around 92.1% of the MUPs of the MUPT.
The effectiveness of using the developed MUPT validation systems and the MUPT editing methods during EMG signal decomposition was investigated by integrating these algorithms into a certainty-based EMG signal decomposition algorithm. Overall, the decomposition accuracy for 32 simulated and 30 real EMG signals was improved by 7.5% (from 86.7% to 94.2%) and 3.4% (from 95.7% to 99.1%), respectively. A significant improvement was also achieved in correctly estimating the number of MUPTs represented in a set of detected MUPs. The simulated and real EMG signals used were comprised of 3–11 and 3–15 MUPTs, respectively.
|
16 |
Modul shlukové analýzy systému pro dolování z dat / Cluster Analysis Module of a Data Mining SystemRiedl, Pavel January 2010 (has links)
This master's thesis deals with development of a module for a data mining system, which is being developed on FIT. The first part describes the general knowledge discovery process and cluster analysis including cluster validation; it also describes Oracle Data Mining including algorithms, which it uses for clustering. At the end it deals with the system and the technologies it uses, such as NetBeans Platform and DMSL. The second part describes design of a clustering module and a module used to compare its results. It also deals with visualization of cluster analysis results and shows the achievements.
|
Page generated in 0.136 seconds