Spelling suggestions: "subject:"kmeans algorithm"" "subject:"bymeans algorithm""
1 |
Thresholded K-means Algorithm for Image SegmentationGirish, Deeptha S. January 2016 (has links)
No description available.
|
2 |
Clusters Identification: Asymmetrical CaseMao, Qian January 2013 (has links)
Cluster analysis is one of the typical tasks in Data Mining, and it groups data objects based only on information found in the data that describes the objects and their relationships. The purpose of this thesis is to verify a modified K-means algorithm in asymmetrical cases, which can be regarded as an extension to the research of Vladislav Valkovsky and Mikael Karlsson in Department of Informatics and Media. In this thesis an experiment is designed and implemented to identify clusters with the modified algorithm in asymmetrical cases. In the experiment the developed Java application is based on knowledge established from previous research. The development procedures are also described and input parameters are mentioned along with the analysis. This experiment consists of several test suites, each of which simulates the situation existing in real world, and test results are displayed graphically. The findings mainly emphasize the limitations of the algorithm, and future work for digging more essences of the algorithm is also suggested.
|
3 |
A Machine Learning Approach for Studying Linked Residential BurglariesMárquez, Ángela Marqués January 2014 (has links)
Context. Multiple studies demonstrate that most of the residential burglaries are committed by a few offenders. Statistics collected by the Swedish National Council for Crime Prevention show that the number of residential burglary varies from year to year. But this value normally increases. Besides, around half of all reported burglaries occur in big cities and only some burglaries occur in sparsely-populated areas. Thus, law enforcement agencies need to study possible linked residential burglaries for their investigations. Linking crime-reports is a difficult task and currently there is not a systematic way to do it. Objectives. This study presents an analysis of the different features of the collected residential burglaries by the law enforcement in Sweden. The objective is to study the possibility of linking crimes depending on these features. The characteristics used are residential features, modus operandi, victim features, goods stolen, difference of days and distance between crimes. Methods. To reach the objectives, quasi experiment and repeated measures are used. To obtain the distance between crimes, routes using Google maps are used. Different cluster methods are investigated in order to obtain the best cluster solution for linking residential burglaries. In addition, the study compares different algorithms in order to identify which algorithm offers the best performance in linking crimes. Results. Clustering quality is measured using different methods, Rule of Thumb, the Elbow method and Silhouette. To evaluate these measurements, ANOVA, Tukey and Fisher’s test are used. Silhouette presents the greatest quality level compared to other methods. Other clustering algorithms present similar average Silhouette width, and therefore, similar quality clustering. Results also show that distance, days and residential features are the most important features to link crimes. Conclusions. The clustering suggestion denotes that it is possible to reduce the amount of burglaries cases. This reduction is done by finding linked residential burglaries. Having done the clustering, the results have to be investigated by law enforcement.
|
4 |
A Novel 3-D Segmentation Algorithm for Anatomic Liver and Tumor Volume Calculations for Liver Cancer Treatment PlanningGoryawala, Mohammed 23 March 2012 (has links)
Three-Dimensional (3-D) imaging is vital in computer-assisted surgical planning including minimal invasive surgery, targeted drug delivery, and tumor resection. Selective Internal Radiation Therapy (SIRT) is a liver directed radiation therapy for the treatment of liver cancer. Accurate calculation of anatomical liver and tumor volumes are essential for the determination of the tumor to normal liver ratio and for the calculation of the dose of Y-90 microspheres that will result in high concentration of the radiation in the tumor region as compared to nearby healthy tissue. Present manual techniques for segmentation of the liver from Computed Tomography (CT) tend to be tedious and greatly dependent on the skill of the technician/doctor performing the task.
This dissertation presents the development and implementation of a fully integrated algorithm for 3-D liver and tumor segmentation from tri-phase CT that yield highly accurate estimations of the respective volumes of the liver and tumor(s). The algorithm as designed requires minimal human intervention without compromising the accuracy of the segmentation results. Embedded within this algorithm is an effective method for extracting blood vessels that feed the tumor(s) in order to plan effectively the appropriate treatment.
Segmentation of the liver led to an accuracy in excess of 95% in estimating liver volumes in 20 datasets in comparison to the manual gold standard volumes. In a similar comparison, tumor segmentation exhibited an accuracy of 86% in estimating tumor(s) volume(s). Qualitative results of the blood vessel segmentation algorithm demonstrated the effectiveness of the algorithm in extracting and rendering the vasculature structure of the liver. Results of the parallel computing process, using a single workstation, showed a 78% gain. Also, statistical analysis carried out to determine if the manual initialization has any impact on the accuracy showed user initialization independence in the results.
The dissertation thus provides a complete 3-D solution towards liver cancer treatment planning with the opportunity to extract, visualize and quantify the needed statistics for liver cancer treatment. Since SIRT requires highly accurate calculation of the liver and tumor volumes, this new method provides an effective and computationally efficient process required of such challenging clinical requirements.
|
5 |
Principal points, principal curves and principal surfacesGaney, Raeesa January 2015 (has links)
The idea of approximating a distribution is a prominent problem in statistics. This dissertation explores the theory of principal points and principal curves as approximation methods to a distribution. Principal points of a distribution have been initially introduced by Flury (1990) who tackled the problem of optimal grouping in multivariate data. In essence, principal points are the theoretical counterparts of cluster means obtained by the k-means algorithm. Principal curves defined by Hastie (1984), are smooth one-dimensional curves that pass through the middle of a p-dimensional data set, providing a nonlinear summary of the data. In this dissertation, details on the usefulness of principal points and principal curves are reviewed. The application of principal points and principal curves are then extended beyond its original purpose to well-known computational methods like Support Vector Machines in machine learning.
|
6 |
Initialization of the k-means algorithm : A comparison of three methodsJorstedt, Simon January 2023 (has links)
k-means is a simple and flexible clustering algorithm that has remained in common use for 50+ years. In this thesis, we discuss the algorithm in general, its advantages, weaknesses and how its ability to locate clusters can be enhanced with a suitable initialization method. We formulate appropriate requirements for the (batched) UnifRandom, k-means++ and Kaufman initialization methods and compare their performance on real and generated data through simulations. We find that all three methods (followed by the k-means procedure) are able to accurately locate at least up to nine well-separated clusters, but the appropriately batched UnifRandom and the Kaufman methods are both significantly more computationally expensive than the k-means++ method already for K = 5 clusters in a dataset of N = 1000 points.
|
7 |
Agrupamento de textos utilizando divergência Kullback-Leibler / Texts grouping using Kullback-Leibler divergenceWillian Darwin Junior 22 February 2016 (has links)
O presente trabalho propõe uma metodologia para agrupamento de textos que possa ser utilizada tanto em busca textual em geral como mais especificamente na distribuição de processos jurídicos para fins de redução do tempo de resolução de conflitos judiciais. A metodologia proposta utiliza a divergência Kullback-Leibler aplicada às distribuições de frequência dos radicais (semantemas) das palavras presentes nos textos. Diversos grupos de radicais são considerados, formados a partir da frequência com que ocorrem entre os textos, e as distribuições são tomadas em relação a cada um desses grupos. Para cada grupo, as divergências são calculadas em relação à distribuição de um texto de referência formado pela agregação de todos os textos da amostra, resultando em um valor para cada texto em relação a cada grupo de radicais. Ao final, esses valores são utilizados como atributos de cada texto em um processo de clusterização utilizando uma implementação do algoritmo K-Means, resultando no agrupamento dos textos. A metodologia é testada em exemplos simples de bancada e aplicada a casos concretos de registros de falhas elétricas, de textos com temas em comum e de textos jurídicos e o resultado é comparado com uma classificação realizada por um especialista. Como subprodutos da pesquisa realizada, foram gerados um ambiente gráfico de desenvolvimento de modelos baseados em Reconhecimento de Padrões e Redes Bayesianas e um estudo das possibilidades de utilização de processamento paralelo na aprendizagem de Redes Bayesianas. / This work proposes a methodology for grouping texts for the purposes of textual searching in general but also specifically for aiding in distributing law processes in order to reduce time applied in solving judicial conflicts. The proposed methodology uses the Kullback-Leibler divergence applied to frequency distributions of word stems occurring in the texts. Several groups of stems are considered, built up on their occurrence frequency among the texts and the resulting distributions are taken regarding each one of those groups. For each group, divergences are computed based on the distribution taken from a reference text originated from the assembling of all sample texts, yelding one value for each text in relation to each group of stems. Finally, those values are taken as attributes of each text in a clusterization process driven by a K-Means algorithm implementation providing a grouping for the texts. The methodology is tested for simple toy examples and applied to cases of electrical failure registering, texts with similar issues and law texts and compared to an expert\'s classification. As byproducts from the conducted research, a graphical development environment for Pattern Recognition and Bayesian Networks based models and a study on the possibilities of using parallel processing in Bayesian Networks learning have also been obtained.
|
8 |
Agrupamento de textos utilizando divergência Kullback-Leibler / Texts grouping using Kullback-Leibler divergenceDarwin Junior, Willian 22 February 2016 (has links)
O presente trabalho propõe uma metodologia para agrupamento de textos que possa ser utilizada tanto em busca textual em geral como mais especificamente na distribuição de processos jurídicos para fins de redução do tempo de resolução de conflitos judiciais. A metodologia proposta utiliza a divergência Kullback-Leibler aplicada às distribuições de frequência dos radicais (semantemas) das palavras presentes nos textos. Diversos grupos de radicais são considerados, formados a partir da frequência com que ocorrem entre os textos, e as distribuições são tomadas em relação a cada um desses grupos. Para cada grupo, as divergências são calculadas em relação à distribuição de um texto de referência formado pela agregação de todos os textos da amostra, resultando em um valor para cada texto em relação a cada grupo de radicais. Ao final, esses valores são utilizados como atributos de cada texto em um processo de clusterização utilizando uma implementação do algoritmo K-Means, resultando no agrupamento dos textos. A metodologia é testada em exemplos simples de bancada e aplicada a casos concretos de registros de falhas elétricas, de textos com temas em comum e de textos jurídicos e o resultado é comparado com uma classificação realizada por um especialista. Como subprodutos da pesquisa realizada, foram gerados um ambiente gráfico de desenvolvimento de modelos baseados em Reconhecimento de Padrões e Redes Bayesianas e um estudo das possibilidades de utilização de processamento paralelo na aprendizagem de Redes Bayesianas. / This work proposes a methodology for grouping texts for the purposes of textual searching in general but also specifically for aiding in distributing law processes in order to reduce time applied in solving judicial conflicts. The proposed methodology uses the Kullback-Leibler divergence applied to frequency distributions of word stems occurring in the texts. Several groups of stems are considered, built up on their occurrence frequency among the texts and the resulting distributions are taken regarding each one of those groups. For each group, divergences are computed based on the distribution taken from a reference text originated from the assembling of all sample texts, yelding one value for each text in relation to each group of stems. Finally, those values are taken as attributes of each text in a clusterization process driven by a K-Means algorithm implementation providing a grouping for the texts. The methodology is tested for simple toy examples and applied to cases of electrical failure registering, texts with similar issues and law texts and compared to an expert\'s classification. As byproducts from the conducted research, a graphical development environment for Pattern Recognition and Bayesian Networks based models and a study on the possibilities of using parallel processing in Bayesian Networks learning have also been obtained.
|
9 |
Cell Formation: A Real Life ApplicationUyanik, Basar 01 September 2005 (has links) (PDF)
In this study, the plant layout problem of a worldwide Printed Circuit Board (PCB) producer company is analyzed. Machines are grouped into cells using grouping methodologies of Tabular Algorithm, K-means clustering algorithm, and Hierarchical grouping with Levenshtein distances. Production plant layouts, which are formed by using different techniques, are evaluated using technical and economical indicators.
|
10 |
Localização de danos em estruturas isotrópicas com a utilização de aprendizado de máquina / Localization of damages in isotropic strutures with the use of machine learningOliveira, Daniela Cabral de [UNESP] 28 June 2017 (has links)
Submitted by DANIELA CABRAL DE OLIVEIRA null (danielacaboliveira@gmail.com) on 2017-07-31T18:25:34Z
No. of bitstreams: 1
Dissertacao.pdf: 4071736 bytes, checksum: 8334dda6779551cc88a5687ed7937bb3 (MD5) / Approved for entry into archive by Luiz Galeffi (luizgaleffi@gmail.com) on 2017-08-03T16:52:18Z (GMT) No. of bitstreams: 1
oliveira_dc_me_ilha.pdf: 4071736 bytes, checksum: 8334dda6779551cc88a5687ed7937bb3 (MD5) / Made available in DSpace on 2017-08-03T16:52:18Z (GMT). No. of bitstreams: 1
oliveira_dc_me_ilha.pdf: 4071736 bytes, checksum: 8334dda6779551cc88a5687ed7937bb3 (MD5)
Previous issue date: 2017-06-28 / Este trabalho introduz uma nova metodologia de Monitoramento da Integridade de Estruturas (SHM, do inglês Structural Health Monitoring) utilizando algoritmos de aprendizado de máquina não-supervisionado para localização e detecção de dano. A abordagem foi testada em material isotrópico (placa de alumínio). Os dados experimentais foram cedidos por Rosa (2016). O banco de dados disponibilizado é abrangente e inclui medidas em diversas situações. Os transdutores piezelétricos foram colados na placa de alumínio com dimensões de 500 x 500 x 2mm, que atuam como sensores e atuadores ao mesmo tempo. Para manipulação dos dados foram analisados os sinais definindo o primeiro pacote do sinal (first packet), considerando apenas o intervalo de tempo igual ao tempo da força de excitação. Neste caso, na há interferência dos sinais refletidos nas bordas da estrutura. Os sinais são obtidos na situação sem dano (baseline) e, posteriormente nas diversas situações de dano. Como método de avaliação do quanto o dano interfere em cada caminho, foram implementadas as seguintes métricas: pico máximo, valor médio quadrático (RMSD), correlação entre os sinais, normas H2 e H∞ entre os sinais baseline e sinais com dano. Logo após o cálculo das métricas para as diversas situações de dano, foi implementado o algoritmo de aprendizado de máquina não-supervisionado K-Means no matlab e também testado no toolbox Weka. No algoritmo K-Means há a necessidade da pré-determinação do número de clusters e isto pode dificultar sua utilização nas situações reais. Então, fez se necessário a implementação de um algoritmo de aprendizado de máquina não-supervisionado que utiliza propagação de afinidades, onde a determinação do número de clusters é definida pela matriz de similaridades. O algoritmo de propagação de afinidades foi desenvolvido para todas as métricas separadamente para cada dano. / This paper introduces a new Structural Health Monitoring (SHM) methodology using unsupervised machine learning algorithms for locating and detecting damage. The approach was tested with isotropic material in an aluminum plate. Experimental data were provided by Rosa (2016). This provided database is open and includes measures in a variety of situations. The piezoelectric transducers were bonded to the aluminum plate with dimensions 500 x 500 x 2mm, and act as sensors and actuators simultaneously. In order to manipulate the data, signals defining the first packet were analyzed. It considers strictly the time interval equal to excitation force length. In this case, there is no interference of reflected signals in the structure boundaries. Signals are gathered at undamaged situation (baseline) and at several damage situations. As an evaluating method of how damage interferes in each path, it was implemented the following metrics: maximum peak, root-mean-square deviation (RMSD), correlation between signals, H2 and H∞ norms regarding baseline and damaged signals. The metrics were computed for numerous damage situations. The data were evaluated in an unsupervised K-Means machine learning algorithm implemented in matlab and also tested in Weka toolbox. However, the K-Means algorithm requires the specification of the number of clusters and it is a problem for practical applications. Therefore, an implementation of an unsupervised machine learning algorithm, which uses affinity propagation was made. In this case, the determination of the number of clusters is defined by the data similarity matrix. The affinity propagation algorithm was developed for all metrics separately for each damage.
|
Page generated in 0.1374 seconds