Spelling suggestions: "subject:"bymeans"" "subject:"coreans""
61 |
Clusters (k) Identification without Triangle Inequality : A newly modelled theory / Clustering(k) without Triangle Inequality : A newly modelled theoryNarreddy, Naga Sambu Reddy, Durgun, Tuğrul January 2012 (has links)
Cluster analysis characterizes data that are similar enough and useful into meaningful groups (clusters).For example, cluster analysis can be applicable to find group of genes and proteins that are similar, to retrieve information from World Wide Web, and to identify locations that are prone to earthquakes. So the study of clustering has become very important in several fields, which includes psychology and other social sciences, biology, statistics, pattern recognition, information retrieval, machine learning and data mining [1] [2]. Cluster analysis is the one of the widely used technique in the area of data mining. According to complexity and amount of data in a system, we can use variety of cluster analysis algorithms. K-means clustering is one of the most popular and widely used among the ten algorithms in data mining [3]. Like other clustering algorithms, it is not the silver bullet. K-means clustering requires pre analysis and knowledge before the number of clusters and their centroids are determined. Recent studies show a new approach for K-means clustering which does not require any pre knowledge for determining the number of clusters [4]. In this thesis, we propose a new clustering procedure to solve the central problem of identifying the number of clusters (k) by imitating the desired number of clusters with proper properties. The proposed algorithm is validated by investigating different characteristics of the analyzed data with modified theory, analyze parameters efficiency and their relationships. The parameters in this theory include the selection of embryo-size (m), significance level (α), distributions (d), and training set (n), in the identification of clusters (k).
|
62 |
Clustering the Web : Comparing Clustering Methods in Swedish / Webbklustring : En jämförelse av klustringsmetoder på svenskaHinz, Joel January 2013 (has links)
Clustering -- automatically sorting -- web search results has been the focus of much attention but is by no means a solved problem, and there is little previous work in Swedish. This thesis studies the performance of three clustering algorithms -- k-means, agglomerative hierarchical clustering, and bisecting k-means -- on a total of 32 corpora, as well as whether clustering web search previews, called snippets, instead of full texts can achieve reasonably decent results. Four internal evaluation metrics are used to assess the data. Results indicate that k-means performs worse than the other two algorithms, and that snippets may be good enough to use in an actual product, although there is ample opportunity for further research on both issues; however, results are inconclusive regarding bisecting k-means vis-à-vis agglomerative hierarchical clustering. Stop word and stemmer usage results are not significant, and appear to not affect the clustering by any considerable magnitude.
|
63 |
Contributions à la segmentation non supervisée d'images hyperspectrales : trois approches algébriques et géométriques / Contributions to unsupervised hyperspectral image segmentation : three algebraic and geometric approachesEl Asmar, Saadallah 30 August 2016 (has links)
Depuis environ une dizaine d’années, les images hyperspectrales produites par les systèmes de télédétection, “Remote Sensing”, ont permis d’obtenir des informations très fiables quant aux caractéristiques spectrales de matériaux présents dans une scène donnée. Nous nous intéressons dans ce travail au problème de la segmentation non supervisée d’images hyperspectrales suivant trois approches bien distinctes. La première, de type Graph Embedding, nécessite deux étapes : une première étape d’appariement des pixels de patchs de l’image initiale grâce à une mesure de similarité spectrale entre pixels et une seconde étape d’appariement d’objets issus des segmentations locales grâce à une mesure de similarité entre objets. La deuxième, de type Spectral Hashing ou Semantic Hashing, repose sur un codage binaire des variations des profils spectraux. On procède à des segmentations par clustering à l’aide d’un algorithme de k-modes adapté au caractère binaire des données à traiter et à l’aide d’une version généralisée de la distance classique de Hamming. La troisième utilise les informations riemanniennes des variétés issues des différentes façons de représenter géométriquement une image hyperspectrale. Les segmentations se font une nouvelle fois par clustering à l’aide d’un algorithme de k-means. Nous exploitons pour cela les propriétés géométriques de l’espace des matrices symétriques définies positives, induites par la métrique de Fisher Rao. / Hyperspectral images provided by modern spectrometers are composed of reflectance values at hundreds of narrow spectral bands covering a wide range of the electromagnetic spectrum. Since spectral reflectance differs for most of the materials or objects present in a given scene, hyperspectral image processing and analysis find many real-life applications. We address in this work the problem of unsupervised hyperspectral image segmentation following three distinct approaches. The first one is of Graph Embedding type and necessitates two steps : first, pixels of the original image patchs are compared using a spectral similarity measure and then objects obtained by local segmentations are fusioned by means of a similarity measure between objects. The second one is of Spectral Hashing or Semantic Hashing type. We first define a binary encoding of spectral variations and then propose a clustering segmentation relying on a k- mode classification algorithm adapted to the categorical nature of the data, the chosen distance being a generalized version of the classical Hamming distance. In the third one, we take advantage of the geometric information given by the manifolds associated to the images. Using the metric properties of the space of Riemannian metrics, that is the space of symmetric positive definite matrices, endowed with the so-called Fisher Rao metric, we propose a k-means algorithm to obtain a cluster partitioning of the image.
|
64 |
GEOESTATÍSTICA E IMAGENS ORBITAIS PARA CARACTERIZAR A DISTRIBUIÇÃO ESPACIAL E DANOS DE LARVAS DE MELOLONTÍDEOS EM CEREAIS DE INVERNO / GEOSTATISTICS AND ORBITAL IMAGES FOR CHARACTERIZING THE SPATIAL DISTRIBUTION AND DAMAGES OF LARVAL MELOLONTÍDEOS IN WINTER CROPSPrá, Elder Dal 01 March 2010 (has links)
Conselho Nacional de Desenvolvimento Científico e Tecnológico / This study aimed to analyze the spatial distribution and use of orbital images for the identification of white grub damage. Will be presented in two chapters, the chapter one, presents the geostatistical characterization of white grub spatial distribution and chapter two describes the use of orbital images for the identification of white grub damage. Surveys were made during 2009 in São Francisco de Assis, Cruz Alta, Ijuí, Lagoa Vermelha, Vacaria and Tapejara, RS. The perimeters of the areas were marked with GPS navigation, with interface for computer-to-hand, used the CR-Campeiro software to create grids of sampling. The population density was estimated from soil trenches, the analysis of spatial variability was made with semivariograms, maps were generated with the ArcGIS 9.3 software, and spatial dependence estimated by classification of Cambardella et al. (1994). The satellite image is from the ALOS satellite, a scene was selected from the PRISM sensor with spatial resolution of 2.5 m, which has a dimension of 35 x 70 km. The classification of the orbital image and white grub spatial distribution map was performed in ENVI software, for this we used the unsupervised classification, the K-means algorithm to evaluate the classification accuracy; this was related to the true field (larvae m-²). The models are adjusted for different species and areas, and the species of white grubs present spatial dependence in all areas, the
semivariograms indicate that the species of white grubs have different behavior spatial variability, the
sampling grids and sampling can be used to characterize the spatial distribution of white grubs, the map of
the spatial distribution of white grubs pests showed the aggregate behavior of these species, white grubs
influence the spectral response of culture, Kappa coefficient is considered good, the ALOS image to
identify white grubs damage. / Este trabalho teve por objetivo analisar a distribuição espacial e a utilização de imagens orbitais na identificação de danos de corós. Será apresentado em dois capítulos; no capítulo um, apresenta-se a caracterização geoestatística da distribuição espacial de larvas de melolontídeos e o capítulo dois descreve o uso de imagens orbitais na identificação dos danos de larvas de melolontídeos. Para isso foram feitos estudos no ano de 2009, em São Francisco de Assis, Cruz Alta, Ijuí, Lagoa Vermelha, Vacaria e Tapejara, RS. O perímetro das áreas foram demarcados com GPS de navegação, com interface para computador-de-mão, utilizou-se o programa computacional CR-Campeiro para confeccionar
os grides de amostragem, a partir da abertura de uma trincheira por ponto do gride e com a contagem das
larvas encontradas, foi estimada a densidade populacional, a análise da variabilidade espacial foi feita com semivariogramas, os mapas foram gerados com o programa computacional ArcGis 9.3, e a dependência espacial estimada pela classificação de Cambardella et al. (1994). A imagem de satélite é oriunda do satélite ALOS, do qual selecionou-se uma cena do sensor PRISM, com resolução espacial de
2,5 m, esta possui dimensão de 35 x 70 km, a classificação da imagem orbital e do mapa da distribuição espacial de corós foi efetuada no programa computacional ENVI; para tal foi utilizada a classificação não supervisionada, pelo algoritmo K-means, para avaliar a exatidão da classificação, esta foi relacionada com a verdade de campo (larvas m-²). Os modelos ajustados são diferentes para as espécies e áreas; a
distribuição espacial das larvas de melolontídeos apresenta dependência espacial em todas as áreas; os
semivariogramas indicam que as espécies de corós apresentam comportamento de variabilidade espacial
diferenciado; os grides amostrais e a técnica de amostragem podem ser utilizados para caracterizar a distribuição espacial de corós; o mapa da distribuição espacial de larvas de melolontídeos demonstrou o comportamento agregado dessas espécies; as larvas de melolontídeos influenciaram na resposta
espectral da cultura; o coeficiente Kappa é considerado bom; a imagem ALOS permite identificar os danos de larvas de melolontídeos.
|
65 |
Nouveaux développements en histologie spectrale IR : application au tissu colique / New developments in IR spectral histology : application to colon tissueNguyen, Thi Nguyet Que 27 January 2016 (has links)
Les développements continus en micro-spectroscopie vibrationnelle IR et en analyse numérique de données multidimensionnelles ont permis récemment l'émergence de l'histologie spectrale. A l'échelle tissulaire et sur une base biomoléculaire, cette nouvelle approche représente un outil prometteur pour une meilleure analyse et caractérisation de différents états physiopathologiques, et potentiellement une aide au diagnostic clinique. Dans ce travail, en utilisant un modèle tissulaire de côlon normal chez la Souris et chez l’Homme, nous avons apporté des améliorations à la chaîne de traitements des données afin d'automatiser et d'optimiser cette histologie spectrale.En effet, dans un premier temps, le développement d’une double application hiérarchique d'indices de validité a permis de déterminer le nombre optimal de classes nécessaire à une caractérisation complète des structures histologiques. Dans un second temps, cette méthode a été généralisée à l'échelle interindividuelle par couplage d'un prétraitement par EMSC (Extended Multiplicative Signal Correction) et d'une classification non-supervisée k-Means; ce couplage étant appliqué conjointement à toutes les images spectrales IR. Enfin, compte tenu de l'essor des métaheuristiques et de leur capacité à résoudre des problèmes complexes d'optimisation numérique, nous avons transposé un algorithme mémétique aux données spectrales IR. Ce nouvel algorithme se compose d'un algorithme génétique et d'un raffinement par classification non-supervisée k-Means. Comparé aux méthodes classiques de clustering, cet algorithme mémétique appliqué aux images spectrales IR, a permis de réaliser une classification non-supervisée optimale et indépendante de l'initialisation. / Recent developments in IR vibrational microspectroscopy and numerical multidimensional analysis have led to the emergence of spectral histology. At the tissue level, this new approach represents an attractive tool for a better analysis and characterization of pathophysiological states and for diagnostic challenges. Here, using normal murine and human colon tissues, data processing steps have been improved for automating and optimizing this spectral histology. First, the development of a hierarchical double application of validity indices permitted to determine the optimal number of clusters that correctly identified the different colon histological components. Second, this method has been improved to perform spectral histology at the inter-individual level. For this, EMSC (Extended Multiplicative Signal Correction) preprocessing has been successfully combined to k-Means clustering. Finally, given the ability of metaheuristics to solve complex optimization problems, a memetic algorithm has been developed for IR spectral data clustering. This algorithm is composed of a genetic algorithm and a k-Means clustering refinement. Compared with conventional clustering methods, our memetic algorithm allowed to generate an optimal and initialization-independent clustering.
|
66 |
Využití fuzzy množin ve shlukové analýze se zaměřením na metodu Fuzzy C-means Clustering / Fuzzy Sets Use in Cluster Analysis with a Special Attention to a Fuzzy C-means Clustering MethodCamara, Assa January 2020 (has links)
This master thesis deals with cluster analysis, more specifically with clustering methods that use fuzzy sets. Basic clustering algorithms and necessary multivariate transformations are described in the first chapter. In the practical part, which is in the third chapter we apply fuzzy c-means clustering and k-means clustering on real data. Data used for clustering are the inputs of chemical transport model CMAQ. Model CMAQ is used to approximate concentration of air pollutants in the atmosphere. To the data we will apply two different clustering methods. We have used two different methods to select optimal weighting exponent to find data structure in our data. We have compared all 3 created data structures. The structures resembled each other but with fuzzy c-means clustering, one of the clusters did not resemble any of the clustering inputs. The end of the third chapter is dedicated to an attempt to find a regression model that finds the relationship between inputs and outputs of model CMAQ.
|
67 |
Segmentace řeči / Speech segmentationAndrla, Petr January 2010 (has links)
The programme for the segmentation of a speech into fonems was created as a part of the master´s thesis. This programme was made in the programme Matlab and consists of several scripts. The programme serves for automatic segmentation. Speech segmentation is the process of identifying the boundaries between phonemes in spoken natural languages. Automatic segmentation is based on vector quantization. In the first step of algorithm, feature extraction is realized. Then speech segments are assigned to calculated centroids. Position where centroid is changed is marked as a boundary of phoneme. The audiorecords were elaborated by the programme and a operation of the automatic segmentation was analysed. A detailed manual was created to the programme too. Individual used methods of the elaboration of a speech were in the master´s thesis briefly descripted, its implementations in the programme and reasons of set of its parameters.
|
68 |
Hajnalova linie v současné Evropě / The Hajnal line in contemporary EuropeChráska, Miroslav January 2020 (has links)
The master's thesis deals with answer of distribution of countries, which was determined in 1965 in the theoretical concept by John Hajnal, in contemporary Europe. The main aim was to reanalyze the original division of countries using cluster analysis on the basis demographic indicators: average age at first marriage men and women, the average age of a woman at first child birth, the number of divorces per 100 marriages, the proportion of live births in marriage and out of marriage. The data used came from the Eurostat database from 1990 to 2015. Cluster analyzes of European countries were also performed according to the value orientations of their inhabitants in the area of social relations and life expectations. Respondents' statements came from the European Social Survey from 2002 to 2018. Cluster analysis of selected demographic indicators did not confirm two models of Hajnal's concept of marital behavior. Cluster analyzes of respondents' value orientations confirmed the existence of two value approaches to life priorities - a preference for traditionally accepted values and a preference for a dynamic and efficient lifestyle. Keywords Hajnal line, family, marriage, divorce rate, ESS research project, K-means cluster analysis, values
|
69 |
Using Machine Learning for Predictive Maintenance in Modern Ground-Based Radar Systems / Användning av maskininlärning för förutsägbart underhåll i moderna markbaserade radarsystemFaraj, Dina January 2021 (has links)
Military systems are often part of critical operations where unplanned downtime should be avoided at all costs. Using modern machine learning algorithms it could be possible to predict when, where, and at what time a fault is likely to occur which enables time for ordering replacement parts and scheduling maintenance. This thesis is a proof of concept study for anomaly detection in monitoring data, i.e., sensor data from a ground based radar system as an initial experiment to showcase predictive maintenance. The data in this thesis was generated by a Giraffe 4A during normal operation, i.e., no anomalous data with known failures was provided. The problem setting is originally an unsupervised machine learning problem since the data is unlabeled. Speculative binary labels are introduced (start-up state and steady state) to approximate a classification accuracy. The system is functioning correctly in both phases but the monitoring data looks differently. By showing that the two phases can be distinguished, it is possible to assume that anomalous data during break down can be detected as well. Three different machine learning classifiers, i.e., two unsupervised classifiers, K-means clustering and isolation forest and one supervised classifier, logistic regression are evaluated on their ability to detect the start-up phase each time the system is turned on. The classifiers are evaluated graphically and based on their accuracy score. All three classifiers recognize a start up phase for at least four out of seven subsystems. By only analyzing their accuracy score it appears that logistic regression outperforms the other models. The collected results manifests the possibility to distinguish between start-up and steady state both in a supervised and unsupervised setting. To select the most suitable classifier, further experiments on larger data sets are necessary. / Militära system är ofta en del av kritiska operationer där oplanerade driftstopp bör undvikas till varje pris. Med hjälp av moderna maskininlärningsalgoritmer kan det vara möjligt att förutsäga när och var ett fel kommer att inträffa. Detta möjliggör tid för beställning av reservdelar och schemaläggning av underhåll. Denna uppsats är en konceptstudie för detektion av anomalier i övervakningsdata från ett markbaserat radarsystem som ett initialt experiment för att studera prediktivt underhåll. Datat som används i detta arbete kommer från en Saab Giraffe 4A radar under normal operativ drift, dvs. ingen avvikande data med kända brister tillhandahölls. Problemställningen är ursprungligen ett oövervakat maskininlärningsproblem eftersom datat saknar etiketter. Spekulativa binära etiketter introduceras (uppstart och stabil fas) för att uppskatta klassificeringsnoggrannhet. Systemet fungerar korrekt i båda faserna men övervakningsdatat ser annorlunda ut. Genom att visa att de två faserna kan urskiljas, kan man anta att avvikande data också går att detektera när fel uppstår. Tre olika klassificeringsmetoder dvs. två oövervakade maskininlärningmodeller, K-means klustring och isolation forest samt en övervakad modell, logistisk regression utvärderas utifrån deras förmåga att upptäcka uppstartfasen varje gång systemet slås på. Metoderna utvärderas grafiskt och baserat på deras träffsäkerhet. Alla tre metoderna känner igen en startfas för minst fyra av sju delsystem. Genom att endast analysera deras noggrannhetspoäng, överträffar logistisk regression de andra modellerna. De insamlade resultaten demonstrerar möjligheten att skilja mellan uppstartfas och stabil fas, både i en övervakad och oövervakad miljö. För att välja den bästa metoden är det nödvändigt med ytterligare experiment på större datamängder.
|
70 |
An automated approach to clustering with the framework suggested by Bradley, Fayyad and ReinaBerglund, Jesper January 2018 (has links)
Clustering with the framework suggested by Bradley, Fayyad and Reina allows for great scalability. However, practical challenges appear when applying the framework. One of the challenges is to define model parameters. This includes defining the number of clusters (K). Understanding how parameter values affect the final clustering may be challenging even with insight into the algorithm. Automating the clustering would allow for a more widespread use. The research question is thus: How could an automated process for clustering with BFR be defined and what results could such a process yield? A tailored method for parameter optimization is suggested. This method is used with a new and computationally advantageous cluster validity index called population density index. Computing the widely used within set sum of squares error requires an additional pass over the data set. Computing population density index does not. The final step of the automated process is to cluster with the parameters generated in the process. The outcome of these clusterings are measured. The results present data collected over 100 identically defined automated processes. These results show that 97 % of the identified K-values falls within the range of the suggested optimal value 2. The method for optimizing parameters clearly results in parameters that outperform randomized parameters. The suggested population density index has a correlation coefficient of 1.00 with the commonly used within set sum of square error in a 32-dimensional case. An automated process for clustering with BFR has been defined. / Ramverket som föreslås av Bradley, Fayyad och Reina möjliggör storskalig klustring. Att använda ramverket medför dock praktiska utmaningar. En av dessa utmaningar är att definiera modellens parametrar. Detta inkluderar att definiera antalet kluster (K). Att förstå hur angivna parametervärden påverkar det slutgiltiga klustringsresultatet är utmanande även med insikt i algoritmen. Att automatisera klustringen skulle möjliggöra för fler att använda ramverket. Detta resulterar i frågeställningen: Hur skulle en automatiserad process för klustring med BFR kunna definieras och vilka resultat skulle en sådan process kunna ge? En skräddarsydd metod för parameteroptimisering föreslås. Denna används i kombination med ett nytt klustervalideringsindex vilket refereras till som population density index. Användning av detta index medför beräkningsmässiga fördelar. Att beräkna det frekvent använda within set sum of squares-värdet kräver ytterligare en iteration över det använda datasettet. Att beräkna population density index undviker denna extra iteration. Det sista steget i den automatiserade processen är att klustra givet de parametervärden som processen själv definierar. Resultatet av dessa klustringar mäts. Resultaten presenterar data insamlad över 100 individuella försök. För samtliga av dessa var den automatiserade processen identiskt definierad. Resultaten visar att 97 % av de identifierade värdena på K-parametern faller inom en värdemängd baserad på det optimala värdet 2. Att optimera parametervärden med den föreslagna metoden ger tydligt bättre värden än om dessa genereras stokastiskt. Det föreslagna population density index har 1.00 som korrelationskoefficient med det välanvända within set sum of squares-värdet i ett 32-dimensionellt fall. En automatiserad process för att klustra med BFR har definierats.
|
Page generated in 0.0336 seconds