Global ETD Search

31	Algorithmes de classification répartis sur le cloud / Distributed clustering algorithms over a cloud computing platform Durut, Matthieu 28 September 2012 (has links) Les thèmes de recherche abordés dans ce manuscrit ont trait à la parallélisation d’algorithmes de classiﬁcation non-supervisée (clustering) sur des plateformes de Cloud Computing. Le chapitre 2 propose un tour d’horizon de ces technologies. Nous y présentons d’une manière générale le Cloud Computing comme plateforme de calcul. Le chapitre 3 présente l’offre cloud de Microsoft : Windows Azure. Le chapitre suivant analyse certains enjeux techniques de la conception d’applications cloud et propose certains éléments d’architecture logicielle pour de telles applications. Le chapitre 5 propose une analyse du premier algorithme de classiﬁcation étudié : le Batch K-Means. En particulier, nous approfondissons comment les versions réparties de cet algorithme doivent être adaptées à une architecture cloud. Nous y montrons l’impact des coûts de communication sur l’efﬁcacité de cet algorithme lorsque celui-ci est implémenté sur une plateforme cloud. Les chapitres 6 et 7 présentent un travail de parallélisation d’un autre algorithme de classiﬁcation : l’algorithme de Vector Quantization (VQ). Dans le chapitre 6 nous explorons quels schémas de parallélisation sont susceptibles de fournir des résultats satisfaisants en terme d’accélération de la convergence. Le chapitre 7 présente une implémentation de ces schémas de parallélisation. Les détails pratiques de l’implémentation soulignent un résultat de première importance : c’est le caractère en ligne du VQ qui permet de proposer une implémentation asynchrone de l’algorithme réparti, supprimant ainsi une partie des problèmes de communication rencontrés lors de la parallélisation du Batch K-Means. / He subjects addressed in this thesis are inspired from research problems faced by the Lokad company. These problems are related to the challenge of designing efﬁcient parallelization techniques of clustering algorithms on a Cloud Computing platform. Chapter 2 provides an introduction to the Cloud Computing technologies, especially the ones devoted to intensivecomputations. Chapter 3 details more speciﬁcally Microsoft Cloud Computing offer : Windows Azure. The following chapter details technical aspects of cloud application development and provides some cloud design patterns. Chapter 5 is dedicated to the parallelization of a well-known clustering algorithm: the Batch K-Means. It provides insights on the challenges of a cloud implementation of distributed Batch K-Means, especially the impact of communication costs on the implementation efﬁciency. Chapters 6 and 7 are devoted to the parallelization of another clustering algorithm, the Vector Quantization (VQ). Chapter 6 provides an analysis of different parallelization schemes of VQ and presents the various speedups to convergence provided by them. Chapter 7 provides a cloud implementation of these schemes. It highlights that it is the online nature of the VQ technique that enables an asynchronous cloud implementation, which drastically reducesthe communication costs introduced in Chapter 5. Algorithme des k-moyennes Cloud computing K-means clustering Cloud computing
32	Klustringsanalys av driftarbanor i norska havet Brask, Axel, Fageräng, Rasmus January 2023 (has links) No description available. Klustring Specktralklustring K-means Konvexa mängder Oceanografi Driftmätningar Mathematics Matematik
33	Improving Document Clustering by Refining Overlapping Cluster Regions Upadhye, Akshata Rajendra January 2022 (has links) No description available. Information Science document cluster overlapping embeddings purity silhouette K-means
34	Clustering Analysis of Nuclear Proliferation Resistance Measures Jankovsky, Zachary Kyle 02 October 2014 (has links) No description available. Nuclear Engineering
35	Using Hadoop to Cluster Data in Energy System Hou, Jun 03 June 2015 (has links) No description available. Computer Science Hadoop K-means energy data clustering analysis
36	Optimization Approaches for Modeling Sustainable Food Waste Management Systems Kuruppuarachchi, Lakshika Nishadhi 15 September 2022 (has links) No description available. Industrial Engineering
37	Disaster detection using real-time and historical Twitter data analysis Åslund, Emelie January 2022 (has links) No description available. Disaster Twitter K-means Clustering Computer Sciences Datavetenskap (datalogi)
38	A comparison of clustering techniques for short social text messages / En jämförelse av tekniker för klustring av korta sociala textmeddelanden Ranby, Erik January 2016 (has links) The amount of social text messages authored each day is huge and the information contained within is potentially very valuable. Software that can cluster and thereby help analyze these messages would consequently be helpful. This thesis explores several ways of clustering social text messages. Two algorithms and several setups with these algorithms have been tested and evaluated with the same data as input. Based on these evaluations, a comparison has been conducted in order to answer the question which algorithm setup is best suited for the task. The two clustering algorithms that have been the main subjects for the comparison are K-means and agglomerative hierarchical. All setups were run with 3-grams as well as with only single words as features. The evaluation measures used were intra-cluster distance, inter-cluster distance and silhouette value. Intra-cluster distance is the distance between data points in the same cluster while inter-cluster is the distance between the clusters. Silhouette value is another more general evaluation measure that is often used to estimate the quality of a clustering. The results showed that if running time is a high priority, using K-means without 3-grams is preferred. On the other hand, if the quality of the clusters is important and performance is less so, introducing 3-grams together with any of the two algorithms will suit your needs better. / Mängden sociala textmeddelanden som skrivs varje dag är enorm och informationen i dessa kan vara mycket värdefull. Mjukvara som kan klustra och på så sätt analysera dessa meddelanden kan därmed vara användbar. Denna avhandling utforskar flera sätt att klustra sociala textmeddelanden. Två algoritmer och flera konfigureringar med dessa algoritmer har testats och utvärderats med samma indata. Baserat på dessa utvärderingar har en jämförelse utförts för att kunna svara på frågan vilken av dessa konfigureringar som är bäst anpassad för sitt syfte. De två klustringsalgoritmerna som i första hand har jämförts är K-means och agglomerative hierarchical. Alla konfigureringar kördes både med och utan 3-gram som komplement till endast enstaka ord. Utvärderingsmetoderna som användes var intra-cluster distance, inter-cluster distance och silhouette value. Intra-cluster distance är avståndet mellan datapunkterna i samma kluster medan inter-cluster distance är avståndet mellan de olika klustrena. Silhouette value är annan, mer generell, utvärderingsmetod som ofta används för att uppskatta kvaliten på en klustring. Resultaten visade att K-means utan 3-gram är att föredra om kravet på körningstid inte är högst prioriterat. Å andra sidan, om kvaliten på klustringen är viktigare än prestandan på algoritmen, så bör 3-gram användas tillsammans med vilken som av de två algoritmerna. clustering K-means hierarchical Computer Sciences Datavetenskap (datalogi)
39	Approximation to K-Means-Type Clustering Wei, Yu 05 1900 (has links) <p> Clustering involves partitioning a given data set into several groups based on some similarity/dissimilarity measurements. Cluster analysis has been widely used in information retrieval, text and web mining, pattern recognition, image segmentation and software reverse engineering.</p> <p> K-means is the most intuitive and popular clustering algorithm and the working horse for clustering. However, the classical K-means suffers from several flaws. First, the algorithm is very sensitive to the initialization method and can be easily trapped at a local minimum regarding to the measurement (the sum of squared errors) used in the model. On the other hand, it has been proved that finding a global minimal sum of the squared errors is NP-hard even when k = 2. In the present model for K-means clustering, all the variables are required to be discrete and the objective is nonlinear and nonconvex.</p> <p> In the first part of the thesis, we consider the issue of how to derive an optimization model to the minimum sum of squared errors for a given data set based on continuous convex optimization. For this, we first transfer the K-means clustering into a novel optimization model, 0-1 semidefinite programming where the eigenvalues of involved matrix argument must be 0 or 1. This provides an unified way for many other clustering approaches such as spectral clustering and normalized cut. Moreover, the new optimization model also allows us to attack the original problem based on the relaxed linear and semidefinite programming.</p> <p> Moreover, we consider the issue of how to get a feasible solution of the original clustering from an approximate solution of the relaxed problem. By using principal component analysis, we construct a rounding procedure to extract a feasible clustering and show that our algorithm can provide a 2-approximation to the global solution of the original problem. The complexity of our rounding procedure is O(n^(k2(k-1)/2)), which improves substantially a similar rounding procedure in the literature with a complexity O(n^k3/2). In particular, when k = 2, our rounding procedure runs in O(n log n) time. To the best of our knowledge, this is the lowest complexity that has been reported in the literature to find a solution to K-means clustering with guaranteed quality.</p> <p> In the second part of the thesis, we consider approximation methods for the so-called balanced bi-clustering. By using a simple heuristics, we prove that we can improve slightly the constrained K-means for bi-clustering. For the special case where the size of each cluster is fixed, we develop a new algorithm, called Q means, to find a 2-approximation solution to the balanced bi-clustering. We prove that the Q-means has a complexity O(n^2).</p> <p> Numerical results based our approaches will be reported in the thesis as well.</p> / Thesis / Master of Science (MSc)
40	Using UAV Mounted LiDAR to Estimate Plant Height and Growth Dhami, Harnaik Singh 09 September 2019 (has links) In this thesis, we develop algorithms to estimate crop heights as well as to detect plots infarms. Plant height estimation is needed in precision agriculture to monitor plant health andgrowth cycles. We use a 3D LiDAR mounted on an Unmanned Aerial Vehicle (UAV) anduse the LiDAR data for height and plot estimation. We present a general methodology forextracting plant heights from 3D LiDAR with two specific variants for the two environments:row-crops and pasture. The main algorithm is based on ground plane estimation from 3DLiDAR scans, which is then used to determine the height of plants in the scans. For rowcrops, the plot detection uses a K-means clustering algorithm to find the bounding boxes ofthese clusters, and a voting scheme to determine the best-fit width, height, and orientationof the clusters/plots. This best-fit box is then used to create a grid over the LiDAR dataand the plots are extracted. For pasture, relative heights are estimated using data collectedweekly. Both algorithms we evaluated using data collected from actual farms and pasture.The accuracy in plot height estimation was +/- 5.36 % and that for growth estimates was+/- 7.91 %. / Master of Science / Plant height estimation and measurement is a vital task when it comes to farming. Knowing these characteristics help determine whether the plants are growing healthy and when to harvest them. On similar lines, accurate estimates of the plant heights can be used to prevent overgrazing and undergrazing of pastures. However, as farm and plot size increases, getting consistent and accurate measurements becomes a more time-consuming and manually intensive task. Using robots can help solve this problem because they can be used to estimate the height. With sensors that are already available, such as the 3D LiDAR that we use, we can use aerial robots to fly over the farm and collect plant data. This data can then be processed to estimate the plant height, eliminating the need to go out and manually measure every single plant. This thesis discusses a methodology of doing exactly this, as well as detecting plots within a farm. The algorithms are evaluated using data collected from actual farms and pasture. LiDAR Plant Height Estimation Plot Detection K-means PrecisionAgriculture

Search results