Global ETD Search

31	Scalable frameworks and algorithms for cluster ensembles and clustering data streams Hore, Prodip 01 June 2007 (has links) Clustering algorithms are an important tool for data mining and data analysis purposes. Clustering algorithms fall under the category of unsupervised learning algorithms, which can group patterns without an external teacher or labels using some kind of similarity metric. Clustering algorithms are generally iterative in nature and computationally intensive. They will have disk accesses in every iteration for data sets larger than memory, making the algorithms unacceptably slow. Data could be processed in chunks, which fit into memory, to provide a scalable framework. Multiple processors may be used to process chunks in parallel. Clustering solutions from each chunk together form an ensemble and can be merged to provide a global solution. So, merging multiple clustering solutions, an ensemble, is important for providing a scalable framework. Combining multiple clustering solutions or partitions, is also important for obtaining a robust clustering solution, merging distributed clustering solutions, and providing a knowledge reuse and privacy preserving data mining framework. Here we address combining multiple clustering solutions in a scalable framework. We also propose algorithms for incrementally clustering large or very large data sets. We propose an algorithm that can cluster large data sets through a single pass. This algorithm is also extended to handle clustering infinite data streams. These types of incremental/online algorithms can be used for real time processing as they don't revisit data and are capable of processing data streams under the constraint of limited buffer size and computational time. Thus, different frameworks/algorithms have been proposed to address scalability issues in different settings. To our knowledge we are the first to introduce scalable algorithms for merging cluster ensembles, in terms of time and space complexity, on large real world data sets. We are also the first to introduce single pass and streaming variants of the fuzzy c means algorithm. We have evaluated the performance of our proposed frameworks/algorithms both on artificial and large real world data sets. A comparison of our algorithms with other relevant algorithms is discussed. These comparisons show the scalability and effectiveness of the partitions created by these new algorithms. Partitioning Hard-c-means Fuzzy-c-means Scalability Merging Streaming American Studies Arts and Humanities
32	Methodology development algorithms for processing and analysis of optical coherence tomography images (O.C.T.) / Μεθοδολογία ανάπτυξης αλγόριθμων για την επεξεργασία και ανάλυση εικόνων τομογραφίας οπτικής συνοχής (Ο.C.Τ) Μανδελιάς, Κωνστασταντίνος 15 January 2014 (has links) Optical Coherence Tomography (OCT) is a catheter‐based imaging method that employs near‐infrared light to produce high‐resolution cross sectional intravascular images. Α new segmentation technique is implemented for automatic lumen area extraction and stent strut detection in intravascular OCT images for the purpose of quantitative analysis of neointimal hyperplasia (NIH). Also a graphical user interface (GUI) is designed based on the employed algorithm. Methods: Four clinical dataset of frequency‐domain OCT scans of the human femoral artery were analysed. First, a segmentation method based on Fuzzy C Means (FCM) clustering and Wavelet Transform (WT) was applied towards inner luminal contour extraction. Subsequently, stent strut positions were detected by utilizing metrics derived from the local maxima of the wavelet transform into the FCM membership function. Results: The inner lumen contour and the position of stent strut were extracted with very high accuracy. Compared with manual segmentation by an expert physician, the automatic segmentation had an average overlap value of 0.917 ± 0.065 for all OCT images included in the study. Also the proposed method and all automatic segmentation algorithms utilised in this thesis such as k‐means, FCM, MRF – ICM and MRF – Metropolis were compared by means of mean distance difference in mm and processing time in sec with the physician’s manual assessments.. The strut detection procedure successfully identified 9.57 ± 0.5 struts for each OCT image. Conclusions: A new fast and robust automatic segmentation technique combining FCM and WT for lumen border extraction and strut detection in intravascular OCT images was designed and implemented. The proposed algorithm may be employed for automated quantitative morphological analysis of in‐stent neointimal hyperplasia. / Η τομογραφία οπτικής συνοχής (OCT) είναι μια απεικονιστική μέθοδος βασισμένη στον καθετηριασμό και χρησιμοποίει υπέρυθρο φως για να παράγει ένδo‐αγγειακές εικόνες – εγκάρσιας τομής με υψηλή ανάλυση. Σε αυτήν την διατριβή, μια νέα τεχνική τμηματοποίησης υλοποιήθηκε για την αυτόματη εξαγωγή της περιοχής του αυλού καθώς και για την ανίχνευση των «strut» στις ένδo‐ αγγειακές OCT εικόνες με σκοπό την ποσοτική ανάλυση της υπερπλασίας. Επίσης ένα εύκολο στην χρήση περιβάλλον γραφικών για καθημερινή κλινική χρήση σχεδιάστηκε με τον υλοποιημένο αλγόριθμο. Μέθοδοι: Τέσσερις OCT κλινικές εξετάσεις πεδίου‐συχνότητας της ανθρώπινης μηριαίας αρτηρίας αναλύθηκαν. H προτεινόμενη μέθοδος τμηματοποίησης για την εξαγωγή του εσωτερικού περιγράμματος αυλού, είναι βασισμένη στον Fuzzy CMeans (FCM) clustering και τον μετασχηματισμό κυματιδίου. Στη συνέχεια, οι θέσεις των «strut» εντοπίστηκαν χρησιμοποιώντας διάφορες τοπικές παραμέτρους που προέρχονται από τα τοπικά μέγιστα του μετασχηματισμού κυματιδίων εντός της FCM συνάρτησης. Αποτελέσματα: Το εσωτερικό περίγραμμα αυλού και η θέση των «strut» εξήχθηκαν με πολύ μεγάλη ακρίβεια. Σε σύγκριση με την ποσοτική αξιολόγηση από έναν ειδικό ιατρό, η αυτόματη τμηματοποίηση είχε μέση τιμή επικάλυψης 0,917±0,065 για όλες τις OCT εικόνες που περιλαμβάνονται στη μελέτη. Επίσης, έγινε σύγκριση με τους k‐means, FCM, ICM και Μetropolis αυτόματους αλγόριθμους τμηματοποίησης για εξαγωγή του εσωτερικού περιγράμματος αυλού και επέδειξε υψηλής ακρίβειας αποτελέσματα στον μικρότερο δυνατό χρόνο επεξεργασίας. Η διαδικασία ανίχνευσης «strut» προσδιόρισε επιτυχώς 9.57± 0,5 «strut» για κάθε OCT εικόνα. Συμπεράσματα: Μια νέα αποτελεσματική και γρήγορη αυτόματη τεχνική τμηματοποίησης που συνδυάζει FCM και WT για την εξαγωγή των ορίων του αυλού και την ανίχνευση των «strut» στις ένδο‐αγγειακές εικόνες OCT σχεδιάστηκε και υλοποιήθηκε. Ο προτεινόμενος αλγόριθμος μπορεί να χρησιμοποιηθεί για την αυτοματοποιημένη ποσοτική μορφολογική ανάλυση της υπερπλασίας. Optical coherence tomography (Ο.C.Τ.) Wavelets 616.075 45 Κυματίδια Fuzzy c-means
33	Clusterização baseada em algoritmos fuzzy Lopes Cavalcanti Junior, Nicomedes January 2006 (has links) Made available in DSpace on 2014-06-12T15:59:42Z (GMT). No. of bitstreams: 1 license.txt: 1748 bytes, checksum: 8a4605be74aa9ea9d79846c1fba20a33 (MD5) Previous issue date: 2006 / Análise de cluster é uma técnica aplicada a diversas áreas como mineração de dados, reconhecimento de padrões, processamento de imagens. Algoritmos de clusterização têm por objetivo particionar um conjunto de dados em clusters de tal forma que indivíduos dentro de um mesmo cluster tenham um alto grau de similaridade, enquanto indivíduos pertencentes a diferentes clusters tenham alto grau de dissimilaridade. Uma importante divisão dos algoritmos de clusterização é entre algoritmos hard e fuzzy. Algoritmos hard associam um indivíduo a somente um cluster. Ao contrário, algoritmos fuzzy associam um indivíduo a todos os clusters através da variação do grau de pertinência do indivíduo em cada cluster. A vantagem de um algoritmo clusterização fuzzy é que este pode representar melhor incerteza e este fato é importante, por exemplo, para mostrar que um indivíduo não é um típico indivíduo de nenhuma das classes, mas tem similaridade em maior ou menor grau com mais de uma classe. Uma forma intuitiva de medir similaridade entre indivíduos é usar medidas de distância tais como a distância euclidiana. Existem muitas medidas de distância disponíveis na literatura. Muitos dos algoritmos de clusterização populares geralmente buscam minimizar um critério baseados numa medida de distância. Através de um processo iterativo estes algoritmos calculam parâmetros de modo a diminuir o valor do critério iteração a iteração até um estado de convergência ser atingido. O problema com muitas das distâncias encontradas na literatura é que elas são estáticas. Para o caso de algoritmos de clusterização iterativos, parece razoável ter distâncias que mudem ou atualizem seus valores de acordo com o que for ocorrendo com os dados e as estruturas de dado do algoritmo. Esta dissertação apresenta duas distâncias adaptativas aplicadas ao algoritmo fuzzy c-means pelo Prof. Francisco de Carvalho. Este algoritmo foi escolhido pelo fato de ser amplamente utilizado. Para avaliar as proposições de distância, experimentos foram feitos utilizando-se conjunto de dados de referência e conjuntos de dados artificiais (para ter resultados mais precisos experimentos do tipo Monte Carlo foram realizados neste caso). Até o momento, comparações das versões do fuzzy c-means, obtidas através da utilização de distâncias adaptativas, com algoritmos similares da literatura permitem concluir que em geral as novas versões têm melhor performance que outros disponíveis na literatura Mineração de dados Distância adaptativa Aprendizagem de máquina Agrupamento nebuloso Fuzzy c-means
34	Solución análitica para plan de difusión y aumento de postulaciones en universidad perteneciente al sistema único de admisión Núñez Maldonado, Matías Andrés January 2018 (has links) Ingeniero Civil Industrial / La educación superior chilena ha permitido el desarrollo de profesionales en un entorno que se vuelve más competitivo, tanto en la educación como en el mundo laboral. Las universidades forman un rol principal en esta arista ya que contemplan más del 50% del mercado de la educación superior, lo que hace que cada vez haya una mayor competencia por la obtención de más alumnos inscritos en la universidad. Sobre todo, para aquellas universidades adscritas al Sistema Único de Admisión. Dentro de las medidas que han tomado las universidades para poder competir entre ellas y tener más alumnos inscritos, está la realización de campañas de difusión física que involucran distintas actividades a realizar en los colegios que estimen convenientes. Es posible que un aumento en 1% en postulaciones, logre generar beneficios por sobre los 61 millones de pesos, generando una justificación económica para realizar dichas actividades. De esta manera, los objetivos de la presente memoria se centran en la posibilidad de incorporar soluciones analíticas, como lo es la creación de un motor de recomendaciones para el proceso de difusión física mediante el perfilamiento de colegios y la asignación óptima de actividades en busca de ayudar a quién, qué y cuándo hacer dicha acción para impactar en las postulaciones. Para lograr los objetivos se desarrollan 8 conglomerados de colegios con el fin de hacer una imputación de datos sobre aquellos que no se les ha realizado actividades de difusión física. La segmentación se realiza a través del método K-means, diferenciados principalmente por atributos socioeconómicos, dependencia y puntaje PSU ponderado de los colegios. De esta manera se logra ejecutar 2 modelos de optimización para incrementar la cantidad de postulaciones a la universidad en estudio, uno que mide el impacto individual de las actividades agregando temporalidad y el segundo el efecto de actividades cruzadas sin temporalidad. El resultado del primer modelo permite aumentar en 2.166 la cantidad de postulaciones, lo que significa un aumento económico de 30% en rentabilidad para la universidad de estudio. Con respecto al segundo modelo, se logra incrementar en 1.233 los alumnos postulantes, logrando que la universidad rente 17% más que lo obtenido el año 2016. Debido, a que se trabaja con datos imputados se realiza un análisis de sobreestimación y subestimación que concluyen que en ambos casos los modelos siguen entregando resultados convenientes para la universidad. Esta memoria permite utilizar los recursos existentes de la universidad para generar una propuesta de valor adecuada que busca incrementar la rentabilidad proveniente de actividades de difusión física. No obstante, se proponen sugerencias de trabajos futuros que se podrían abordar para complementar el presente estudio. / 19/01/2023 Universidades - Chile - Investigaciones Educación superior - Investigaciones Fuzzy C - means
35	Ant Clustering with Consensus Gu, Yuhua 01 April 2009 (has links) Clustering is actively used in several research fields, such as pattern recognition, machine learning and data mining. This dissertation focuses on clustering algorithms in the data mining area. Clustering algorithms can be applied to solve the unsupervised learning problem, which deals with finding clusters in unlabeled data. Most clustering algorithms require the number of cluster centers be known in advance. However, this is often not suitable for real world applications, since we do not know this information in most cases. Another question becomes, once clusters are found by the algorithms, do we believe the clusters are exactly the right ones or do there exist better ones? In this dissertation, we present two new Swarm Intelligence based approaches for data clustering to solve the above issues. Swarm based approaches to clustering have been shown to be able to skip local extrema by doing a form of global search, our two newly proposed ant clustering algorithms take advantage of this. The first algorithm is a kernel-based fuzzy ant clustering algorithm using the Xie-Beni partition validity metric, it is a two stage algorithm, in the first stage of the algorithm ants move the cluster centers in feature space, the cluster centers found by the ants are evaluated using a reformulated kernel-based Xie-Beni cluster validity metric. We found when provided with more clusters than exist in the data our new ant-based approach produces a partition with empty clusters and/or very lightly populated clusters. Then the second stage of this algorithm was applied to automatically detect the number of clusters for a data set by using threshold solutions. The second ant clustering algorithm, using chemical recognition of nestmates is a combination of an ant based algorithm and a consensus clustering algorithm. It is a two-stage algorithm without initial knowledge of the number of clusters. The main contributions of this work are to use the ability of an ant based clustering algorithm to determine the number of cluster centers and refine the cluster centers, then apply a consensus clustering algorithm to get a better quality final solution. We also introduced an ensemble ant clustering algorithm which is able to find a consistent number of clusters with appropriate parameters. We proposed a modified online ant clustering algorithm to handle clustering large data sets. To our knowledge, we are the first to use consensus to combine multiple ant partitions to obtain robust clustering solutions. Experiments were done with twelve data sets, some of which were benchmark data sets, two artificially generated data sets and two magnetic resonance image brain volumes. The results show how the ant clustering algorithms play an important role in finding the number of clusters and providing useful information for consensus clustering to locate the optimal clustering solutions. We conducted a wide range of comparative experiments that demonstrate the effectiveness of the new approaches. Partitioning Fuzzy C means Cluster validity Ant colony Ensemble Non-negative Matrix Factorization American Studies Arts and Humanities
36	Modely a metody pro svozové problému v logistice / Models and methods for routing problems in logistics Muna, Izza Hasanul January 2019 (has links) The thesis focuses on how to optimize vehicle routes for distributing logistics. This vehicle route optimization is known as a vehicle routing problem (VRP). The VRP has been extended in numerous directions for instance by some variations that can be combined. One of the extension forms of VRP is a capacitated VRP with stochastics demands (CVRPSD), where the vehicle capacity limit has a non-zero probability of being violated on any route. So, a failure to satisfy the amount of demand can appear. A strategy is required for updating the routes in case of such an event. This strategy is called as recourse action in the thesis. The main objective of the research is how to design the model of CVRPSD and find the optimal solution. The EEV (Expected Effective Value) and FCM (Fuzzy C-Means) – TSP (Travelling Salesman Problem) approaches are described and used to solve CVRPSD. Results have confirmed that the EEV approach has given a better performance than FCM-TSP for solving CVRPSD in small instances. But EEV has disadvantage, that the EEV is not capable to solve big instances in an acceptable running time because of complexity of the problem. In the real situation, the FCM –TSP approach is more suitable for implementations than the EEV because the FCM – TSP can find the solution in a shorter time. The disadvantage of this algorithm is that the computational time depends on the number of customers in a cluster.
37	Segmentace obrazu podle textury / Texture-Based Image Segmentation Pasáček, Václav January 2012 (has links) Image segmentation is an important step in image processing. A traditional way how to segment an image is a texture-based segmentation that uses texture features to describe image texture. In this work, Local Binary Patterns (LBP) are used for image texture representation. Texture feature is a histogram of occurences of LBP codes in a small image window. The work also aims to comparison of results of various modifications of Local Binary Patterns and their usability in the image segmentation which is done by unsupervised clustering of texture features. The Fuzzy C-Means algorithm is finally used for the clustering in this work.
38	Experiments with K-Means, Fuzzy c-Means and Approaches to Choose K and C Hong, Sui 01 January 2006 (has links) A parameter specifying the number of clusters in an unsupervised clustering algorithm is often unknown. Different cluster validity indices proposed in the past have attempted to address this issue, and their performance is directly related to the accuracy of a clustering algorithm. Toe gap statistic proposed by Tibshirani (2001) was applied to k-means and hierarchical clustering algorithms for estimating the number of clusters and is shown to outperform other cluster validity measures, especially in the null model case. In our experiments, the gap statistic is applied to the Fuzzy c-Means (FCM) algorithm and compared to existing FCM cluster validity indices examined by Pal (1995). A comparison is also made between two initialization methods where centers are randomly assigned to data points or initialized using the furthest first algorithm (Hochbaum, 1985). Toe gap statistic can be applied using the FCM algorithm as long as the fuzzy partition matrix can be employed in computing the gap statistic metric, Wk . Three new methodologies are examined for computing this metric in order to apply the gap statistic to the FCM algorithm. Toe fuzzy partition matrix generated by FCM can also be thresholded based upon the maximum membership to allow computation similar to the kmeans algorithm. This is assumed to be the current method for employing the gap statistic with the FCM algorithm and is compared to the three proposed methods. In our results, the gap statistic outperformed the cluster validity indices for FCM, and one of the new methodologies introduced for computing the metric, based upon the FCM objective function, out performed the threshold method for m=2. FCM; Gap statistic; Machine learning Computer Engineering
39	Neki tipovi rastojanja i fazi mera sa primenom u obradi slika / Some types of distance functions and fuzzy measures with application in imageprocessing Nedović Ljubo 23 September 2017 (has links) <p>Doktorska disertacija izučava primenu fazi operacija, prvenstveno agregacionih operatora na funkcije rastojanja i metrike. Originalan doprinos teze je u konstrukciji novih funkcija rastojanja i metrika primenom agregacionih operatora na neke polazne funkcije rastojanja i metrike. Za neke tipove agregacionih operatora i polaznih funkcija rastojanja i metrika su ispitane osobine ovako konstruisanih funkcija rastojanja i metrika. Za neke od njih su ispitane performanse pri primeni u segmentaciji slike &bdquo;Fuzzy c-means“ algoritmom.</p> / <p>This thesis studies application of fuzzy operations, especially aggregation operators, on distance functions and metrics. The contribution of the thesis is construction of new distance functions and metrics by application of aggregation operators on some basic distance functions and metrics. For some types of aggregation operators and basic distance functions and metrics, properties of distance functions and metrics constructed in this way are analyzed. For some of them, performances in application in Fuzzy c-means algorithm are analyzed.</p>
40	Uma nova forma de calcular os centros dos Clusters em algoritmos de agrupamento tipo fuzzy c-means Vargas, Rogerio Rodrigues de 30 March 2012 (has links) Made available in DSpace on 2014-12-17T15:47:00Z (GMT). No. of bitstreams: 1 RogerioRV_TESE.pdf: 769325 bytes, checksum: ddaac964e1c74fba3533b5cdd90927b2 (MD5) Previous issue date: 2012-03-30 / Coordena??o de Aperfei?oamento de Pessoal de N?vel Superior / Clustering data is a very important task in data mining, image processing and pattern recognition problems. One of the most popular clustering algorithms is the Fuzzy C-Means (FCM). This thesis proposes to implement a new way of calculating the cluster centers in the procedure of FCM algorithm which are called ckMeans, and in some variants of FCM, in particular, here we apply it for those variants that use other distances. The goal of this change is to reduce the number of iterations and processing time of these algorithms without affecting the quality of the partition, or even to improve the number of correct classifications in some cases. Also, we developed an algorithm based on ckMeans to manipulate interval data considering interval membership degrees. This algorithm allows the representation of data without converting interval data into punctual ones, as it happens to other extensions of FCM that deal with interval data. In order to validate the proposed methodologies it was made a comparison between a clustering for ckMeans, K-Means and FCM algorithms (since the algorithm proposed in this paper to calculate the centers is similar to the K-Means) considering three different distances. We used several known databases. In this case, the results of Interval ckMeans were compared with the results of other clustering algorithms when applied to an interval database with minimum and maximum temperature of the month for a given year, referring to 37 cities distributed across continents / Agrupar dados ? uma tarefa muito importante em minera??o de dados, processamento de imagens e em problemas de reconhecimento de padr?es. Um dos algoritmos de agrupamentos mais popular ? o Fuzzy C-Means (FCM). Esta tese prop?e aplicar uma nova forma de calcular os centros dos clusters no algoritmo FCM, que denominamos de ckMeans, e que pode ser tamb?m aplicada em algumas variantes do FCM, em particular aqui aplicamos naquelas variantes que usam outras dist?ncias. Com essa modifica??o, pretende-se reduzir o n?mero de itera??es e o tempo de processamento desses algoritmos sem afetar a qualidade da parti??o ou at? melhorar o n?mero de classifica??es corretas em alguns casos. Tamb?m, desenvolveu-se um algoritmo baseado no ckMeans para manipular dados intervalares considerando graus de pertin?ncia intervalares. Este algoritmo possibilita a representa??o dos dados sem convers?o dos dados intervalares para pontuais, como ocorre com outras extens?es do FCM que lidam com dados intervalares. Para validar com as metodologias propostas, comparou-se o agrupamento ckMeans com os algoritmos K-Means (pois o algoritmo proposto neste trabalho para c?lculo dos centros se assemelha ? do K-Means) e FCM, considerando tr?s dist?ncias diferentes. Foram utilizadas v?rias bases de dados conhecidas. No caso, os resultados do ckMeans intervalar, foram comparadas com outros algoritmos de agrupamento intervalar quando aplicadas a uma base de dados intervalar com a temperatura m?nima e m?xima do m?s de um determinado ano, referente a 37 cidades distribu?das entre os continentes agrupamentos centros dos clusters ckMeans fuzzy C-Means dados intervalares l?gica fuzzy ckMeans cluster center clustering fuzzy C-Means fuzzy logic

Search results