• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 132
  • 39
  • 33
  • 21
  • 11
  • 9
  • 9
  • 7
  • 6
  • 4
  • 4
  • 2
  • 2
  • 2
  • 1
  • Tagged with
  • 317
  • 317
  • 160
  • 66
  • 62
  • 58
  • 44
  • 44
  • 37
  • 37
  • 36
  • 35
  • 35
  • 33
  • 30
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
31

Algorithmes de classification répartis sur le cloud / Distributed clustering algorithms over a cloud computing platform

Durut, Matthieu 28 September 2012 (has links)
Les thèmes de recherche abordés dans ce manuscrit ont trait à la parallélisation d’algorithmes de classification non-supervisée (clustering) sur des plateformes de Cloud Computing. Le chapitre 2 propose un tour d’horizon de ces technologies. Nous y présentons d’une manière générale le Cloud Computing comme plateforme de calcul. Le chapitre 3 présente l’offre cloud de Microsoft : Windows Azure. Le chapitre suivant analyse certains enjeux techniques de la conception d’applications cloud et propose certains éléments d’architecture logicielle pour de telles applications. Le chapitre 5 propose une analyse du premier algorithme de classification étudié : le Batch K-Means. En particulier, nous approfondissons comment les versions réparties de cet algorithme doivent être adaptées à une architecture cloud. Nous y montrons l’impact des coûts de communication sur l’efficacité de cet algorithme lorsque celui-ci est implémenté sur une plateforme cloud. Les chapitres 6 et 7 présentent un travail de parallélisation d’un autre algorithme de classification : l’algorithme de Vector Quantization (VQ). Dans le chapitre 6 nous explorons quels schémas de parallélisation sont susceptibles de fournir des résultats satisfaisants en terme d’accélération de la convergence. Le chapitre 7 présente une implémentation de ces schémas de parallélisation. Les détails pratiques de l’implémentation soulignent un résultat de première importance : c’est le caractère en ligne du VQ qui permet de proposer une implémentation asynchrone de l’algorithme réparti, supprimant ainsi une partie des problèmes de communication rencontrés lors de la parallélisation du Batch K-Means. / He subjects addressed in this thesis are inspired from research problems faced by the Lokad company. These problems are related to the challenge of designing efficient parallelization techniques of clustering algorithms on a Cloud Computing platform. Chapter 2 provides an introduction to the Cloud Computing technologies, especially the ones devoted to intensivecomputations. Chapter 3 details more specifically Microsoft Cloud Computing offer : Windows Azure. The following chapter details technical aspects of cloud application development and provides some cloud design patterns. Chapter 5 is dedicated to the parallelization of a well-known clustering algorithm: the Batch K-Means. It provides insights on the challenges of a cloud implementation of distributed Batch K-Means, especially the impact of communication costs on the implementation efficiency. Chapters 6 and 7 are devoted to the parallelization of another clustering algorithm, the Vector Quantization (VQ). Chapter 6 provides an analysis of different parallelization schemes of VQ and presents the various speedups to convergence provided by them. Chapter 7 provides a cloud implementation of these schemes. It highlights that it is the online nature of the VQ technique that enables an asynchronous cloud implementation, which drastically reducesthe communication costs introduced in Chapter 5.
32

Klustringsanalys av driftarbanor i norska havet

Brask, Axel, Fageräng, Rasmus January 2023 (has links)
No description available.
33

Improving Document Clustering by Refining Overlapping Cluster Regions

Upadhye, Akshata Rajendra January 2022 (has links)
No description available.
34

Clustering Analysis of Nuclear Proliferation Resistance Measures

Jankovsky, Zachary Kyle 02 October 2014 (has links)
No description available.
35

Using Hadoop to Cluster Data in Energy System

Hou, Jun 03 June 2015 (has links)
No description available.
36

Optimization Approaches for Modeling Sustainable Food Waste Management Systems

Kuruppuarachchi, Lakshika Nishadhi 15 September 2022 (has links)
No description available.
37

Disaster detection using real-time and historical Twitter data analysis

Åslund, Emelie January 2022 (has links)
No description available.
38

A comparison of clustering techniques for short social text messages / En jämförelse av tekniker för klustring av korta sociala textmeddelanden

Ranby, Erik January 2016 (has links)
The amount of social text messages authored each day is huge and the information contained within is potentially very valuable. Software that can cluster and thereby help analyze these messages would consequently be helpful. This thesis explores several ways of clustering social text messages. Two algorithms and several setups with these algorithms have been tested and evaluated with the same data as input. Based on these evaluations, a comparison has been conducted in order to answer the question which algorithm setup is best suited for the task. The two clustering algorithms that have been the main subjects for the comparison are K-means and agglomerative hierarchical. All setups were run with 3-grams as well as with only single words as features. The evaluation measures used were intra-cluster distance, inter-cluster distance and silhouette value. Intra-cluster distance is the distance between data points in the same cluster while inter-cluster is the distance between the clusters. Silhouette value is another more general evaluation measure that is often used to estimate the quality of a clustering. The results showed that if running time is a high priority, using K-means without 3-grams is preferred. On the other hand, if the quality of the clusters is important and performance is less so, introducing 3-grams together with any of the two algorithms will suit your needs better. / Mängden sociala textmeddelanden som skrivs varje dag är enorm och informationen i dessa kan vara mycket värdefull. Mjukvara som kan klustra och på så sätt analysera dessa meddelanden kan därmed vara användbar. Denna avhandling utforskar flera sätt att klustra sociala textmeddelanden. Två algoritmer och flera konfigureringar med dessa algoritmer har testats och utvärderats med samma indata. Baserat på dessa utvärderingar har en jämförelse utförts för att kunna svara på frågan vilken av dessa konfigureringar som är bäst anpassad för sitt syfte. De två klustringsalgoritmerna som i första hand har jämförts är K-means och agglomerative hierarchical. Alla konfigureringar kördes både med och utan 3-gram som komplement till endast enstaka ord. Utvärderingsmetoderna som användes var intra-cluster distance, inter-cluster distance och silhouette value. Intra-cluster distance är avståndet mellan datapunkterna i samma kluster medan inter-cluster distance är avståndet mellan de olika klustrena. Silhouette value är annan, mer generell, utvärderingsmetod som ofta används för att uppskatta kvaliten på en klustring. Resultaten visade att K-means utan 3-gram är att föredra om kravet på körningstid inte är högst prioriterat. Å andra sidan, om kvaliten på klustringen är viktigare än prestandan på algoritmen, så bör 3-gram användas tillsammans med vilken som av de två algoritmerna.
39

Approximation to K-Means-Type Clustering

Wei, Yu 05 1900 (has links)
<p> Clustering involves partitioning a given data set into several groups based on some similarity/dissimilarity measurements. Cluster analysis has been widely used in information retrieval, text and web mining, pattern recognition, image segmentation and software reverse engineering.</p> <p> K-means is the most intuitive and popular clustering algorithm and the working horse for clustering. However, the classical K-means suffers from several flaws. First, the algorithm is very sensitive to the initialization method and can be easily trapped at a local minimum regarding to the measurement (the sum of squared errors) used in the model. On the other hand, it has been proved that finding a global minimal sum of the squared errors is NP-hard even when k = 2. In the present model for K-means clustering, all the variables are required to be discrete and the objective is nonlinear and nonconvex.</p> <p> In the first part of the thesis, we consider the issue of how to derive an optimization model to the minimum sum of squared errors for a given data set based on continuous convex optimization. For this, we first transfer the K-means clustering into a novel optimization model, 0-1 semidefinite programming where the eigenvalues of involved matrix argument must be 0 or 1. This provides an unified way for many other clustering approaches such as spectral clustering and normalized cut. Moreover, the new optimization model also allows us to attack the original problem based on the relaxed linear and semidefinite programming.</p> <p> Moreover, we consider the issue of how to get a feasible solution of the original clustering from an approximate solution of the relaxed problem. By using principal component analysis, we construct a rounding procedure to extract a feasible clustering and show that our algorithm can provide a 2-approximation to the global solution of the original problem. The complexity of our rounding procedure is O(n^(k2(k-1)/2)), which improves substantially a similar rounding procedure in the literature with a complexity O(n^k3/2). In particular, when k = 2, our rounding procedure runs in O(n log n) time. To the best of our knowledge, this is the lowest complexity that has been reported in the literature to find a solution to K-means clustering with guaranteed quality.</p> <p> In the second part of the thesis, we consider approximation methods for the so-called balanced bi-clustering. By using a simple heuristics, we prove that we can improve slightly the constrained K-means for bi-clustering. For the special case where the size of each cluster is fixed, we develop a new algorithm, called Q means, to find a 2-approximation solution to the balanced bi-clustering. We prove that the Q-means has a complexity O(n^2).</p> <p> Numerical results based our approaches will be reported in the thesis as well.</p> / Thesis / Master of Science (MSc)
40

A comparison of driving characteristics and environmental characteristics using factor analysis and k-means clustering algorithm

Jung, Heejin 19 September 2012 (has links)
The dissertation aims to classify drivers based on driving and environmental behaviors. The research determined significant factors using factor analysis, identified different driver types using k-means clustering, and studied how the same drivers map in each classification domain. The research consists of two study cases. In the first study case, a new variable is proposed and then is used for classification. The drivers were divided into three groups. Two alternatives were designed to evaluate the environmental impact of driving behavior changes. In the second study case, two types of data sets were constructed: driving data and environmental data. The driving data represents driving behavior of individual drivers. The environmental data represents emissions and fuel consumption estimated by microscopic energy and emissions models. Significant factors were explored in each data set using factor analysis. A pair of factors was defined for each data set. Each pair of factors was used for each k-means clustering: driving clustering and environmental clustering. Then the factors were used to identify groups of drivers in each clustering domain. In the driving clustering, drivers were grouped into three clusters. In the environmental clustering, drivers were clustered into two groups. The groups from the driving clustering were compared to the groups from the environmental clustering in terms of emissions and fuel consumption. The three groups of drivers from the driving clustering were also mapped in the environmental domain. The results indicate that the differences in driving patterns among the three driver groups significantly influenced the emissions of HC, CO, and NOx. As a result, it was determined that the average target operating acceleration and braking did essentially influence the amount of emissions in terms of HC, CO, and NOx. Therefore, if drivers were to change their driving behavior to be more defensive, it is expected that emissions of HC, CO, and NOx would decrease. It was also found that spacing-based driving tended to produce less emissions but consumed more fuel than other groups, while speed-based driving produced relatively more emissions. On the other hand, the defensively moderate drivers consumed less fuel and produced fewer emissions. / Ph. D.

Page generated in 0.0422 seconds