Global ETD Search

21	Bagged clustering Leisch, Friedrich January 1999 (has links) (PDF) A new ensemble method for cluster analysis is introduced, which can be interpreted in two different ways: As complexity-reducing preprocessing stage for hierarchical clustering and as combination procedure for several partitioning results. The basic idea is to locate and combine structurally stable cluster centers and/or prototypes. Random effects of the training set are reduced by repeatedly training on resampled sets (bootstrap samples). We discuss the algorithm both from a more theoretical and an applied point of view and demonstrate it on several data sets. (author's abstract) / Series: Working Papers SFB "Adaptive Information Systems and Modelling in Economics and Management Science"
22	A K-MEANS BASED WATERSHED IMAGING SEGMENTATION ALGORITHM FOR BANANA CLUSTER QUALITY INSPECTION Castillo, Gregorio Alfonso 01 December 2016 (has links) Banana has become the most commonly consumed fresh fruit among US population. It is a challenge to use computer vision to divide touching bananas, for this purpose a novel image segmentation algorithm is proposed, combining k-means and the watershed transformation. The first part is to extract the background, achieved using a K-means based in the HS space, the second part is individual banana segmentation where a smarter selection of the initial markers from where the watershed transformation grows is attained fusing two morphological filters with different structural elements. The validation of the proposed algorithm has been conducted using 124 experimentally capture banana pictures manually segmented. For background extraction K-means in HS space produced the best performance over the other two tested (Otsu, K-means(Lab), getting average a F1 Score average of 96.99%, Otsu and K-means(Lab) scored 82.58% and 88.06% respectively. The result of the watershed segmentation was also compared with the manual segmentation; The overall performance using the F1 Score in average is 92.28%. The performance would improve with modifications to the system, including a more homogenous illumination, only allowing certain positions to be possible for the bananas cluster, and a more adequate background selection. Color Image Segmentation HSI color space K-means Watershed
23	Diseño de procesos para la segmentación de clientes según su comportamiento de compra y hábito de consumo en una empresa de consumo masivo Rojas Araya, Javier Orlando January 2017 (has links) Magíster en Ingeniería de Negocios con Tecnologías de Información / La industria de alimentos de consumo masivo ha ido evolucionando en el tiempo. Los primeros canales de venta para llegar a los clientes finales fueron los almacenes de barrio los que se vieron fuertemente amenazados con la proliferación de grandes cadenas de supermercados. La aparición de internet también creó un nuevo canal que permite a los clientes finales hacer pedidos de productos y pagarlos a través de aplicaciones móviles para finalmente recibirlos en su domicilio. A pesar de esta evolución en los canales, los almacenes de barrio se niegan a desaparecer. Son muchos los clientes que siguen prefiriendo la atención amable y personalizada de los almacenes junto con un abanico amplio de productos y precios atractivos. La empresa no está ajena a esta realidad y también comercializa sus productos a clientes finales por los canales supermercado y almacenes. Respecto a los almacenes se atiende mensualmente una cantidad aproximada de 25.000 clientes a nivel nacional donde existe una mayor concentración en la zona centro del país. Segmentar a estos clientes para conocer su comportamiento de compra y hábito de consumo se ha convertido en el eje central de la estrategia de este canal. Ya no basta con analizar los reportes de ventas para aumentar el rendimiento del Área Comercial. Este proyecto tiene por objetivo agrupar los clientes del canal Almacenes de la empresa bajo los conceptos de comportamiento de compra y hábito de consumo y lograr caracterizarlos. Para alcanzar esta meta se utiliza la metodología de Ingeniería de Negocios que parte desde la definición del posicionamiento estratégico, el modelo de negocio, la arquitectura de procesos, el diseño detallado de los procesos, el diseño del apoyo tecnológico que soportará a los procesos y finalmente la construcción y puesta en marcha de la solución. Además se utilizarán algoritmos propios para este tipo de tareas como son DBSCAN y K-Means. Los resultados obtenidos permiten segmentar a los clientes en siete grupos para el comportamiento de compra y siete para el hábito consumo. Con esto se puede responder las preguntas de cuándo, cuánto y qué compran los clientes del canal. El beneficio del proyecto se traduce en un aumento de las ventas por acciones que permiten recuperar a clientes que están en proceso de fugarse y por aumento del ticket promedio de aquellos clientes que realizan compras frecuentes pero de muy bajo monto de facturación. / 07/04/2022 Segmentación del mercado Gestión de negocios - Chile K-means DBSCAN
24	An Empirical Analysis of Network Traffic: Device Profiling and Classification Anbazhagan, Mythili Vishalini 02 July 2019 (has links) Time and again we have seen the Internet grow and evolve at an unprecedented scale. The number of online users in 1995 was 40 million but in 2020, number of online devices are predicted to reach 50 billion, which would be 7 times the human population on earth. Up until now, the revolution was in the digital world. But now, the revolution is happening in the physical world that we live in; IoT devices are employed in all sorts of environments like domestic houses, hospitals, industrial spaces, nuclear plants etc., Since they are employed in a lot of mission-critical or even life-critical environments, their security and reliability are of paramount importance because compromising them can lead to grave consequences. IoT devices are, by nature, different from conventional Internet connected devices like laptops, smart phones etc., They have small memory, limited storage, low processing power etc., They also operate with little to no human intervention. Hence it becomes very important to understand IoT devices better. How do they behave in a network? How different are they from traditional Internet connected devices? Can they be identified from their network traffic? Is it possible for anyone to identify them just by looking at the network data that leaks outside the network, without even joining the network? That is the aim of this thesis. To the best of our knowledge, no study has collected data from outside the network, without joining the network, with the intention of finding out if IoT devices can be identified from this data. We also identify parameters that classify IoT and non-IoT devices. Then we do manual grouping of similar devices and then do the grouping automatically, using clustering algorithms. This will help in grouping devices of similar nature and create a profile for each kind of device. IoT Network Data Analytics Machine Learning Clustering K-Means
25	Klusteranalys av cykelflödesdata för identifiering av viktiga faktorer och avvikande datapunkter Hojeij, Mohamed, Tram, Alex January 2019 (has links) Studien har för avsikt att förbättra kunskapen om vilka faktorer som påverkar cykelflödeten viss dag i Malmö. Vi har huvudsakligen undersökt frågor om, hur många grupperandekluster är optimalt för att kunna identifiera avvikande dagar och vilka är dess faktoreri en tidsserie cykelvolymdata? Vår arbetsmetod var att använda ett matchande tillvägagångssätt baserat på ett experiment tillsammans med en utvärderingsmetod. Arbetsmetoden skedde i en iterativ process där experimentet var att hitta rätt antal kluster ochdär utvärderingen var analysen av resultaten som producerades av experimentet. Datanerhållen från en cykelräknare belägen på Kaptensgatan i Malmö fick databearbetas medhjälp av normalisering då volymen av cyklister inte ska ha någon påverkan i studien. Syftetmed vårt arbete är att kunna identifiera avvikande datapunkter och dess faktorer med storinverkan på cykelflöden med hjälp av klusteranalys då detta kan leda till mer välinformerade beslut vid stads- och transportplanering. Om det gick att analysera cyklister där dessafaktorer elimineras så skulle detta leda till vidare utveckling och forskning av stor betydelseför Malmö stad. Genom att använda oss av klusteranalysen K-means och Euklidisk distanssom används som beräkning av distanser inom liknande områden kunde vi finna relevantakluster med avvikande datapunkter och faktorer med stor inverkan på cykelflödet. Vårtresultat visar att 7 kluster varav 2 av de delades upp till 6 mindre kluster, var det mest optimala för studien och faktorerna med en stor inverkan på de antal registrerade cyklisternaunder vissa dagar kunde då identifieras bäst. Faktorerna som identifierades var evenemang,festivaler, fotbollsmatcher, konserter, lovdagar, nederbörd och röda dagar. / This study aims to provide a deeper understanding of the different factors and their impacton the bicycle flow in Malmö during a certain day. We mainly examined the questions,what is the most optimal number of clusters needed in order to identify discrepancies, andwhich key factors have huge impact in a dataset? The choice of the method used in thisstudy is a matching approach based on experiment together with an evaluation method.The work method occurred in an iterative process, where the experiment was conductedto find the right number of clusters and the evaluation was the analysing of the resultsthat were produced by the experiment. The collected data from a bicycle counter, locatedin Kaptensgatan in Malmö, had to be processed with normalization to ensure that thevolume of the bicycles does not affect the study. The purpose of our study is to identifydiscrepancies and key factors that have huge implications on the bicycle flow with thehelp of cluster analysis that might lead to more well-informed decision in urban planningand transportation planning. If it were possible to analyze cyclists where these factorsare eliminated, this would lead to further development and research of great importancefor Malmö City. By using the cluster algorithm K-means, and Euclidean distance, whichis used as calculation of distances in similar areas, we could then find relevant clusterswith deviating data points and key factors with great impact on the bicycle flow. Ourresults shows that 7 clusters, 2 of which were divided up to 6 smaller clusters, were themost optimal for the study and the factors with a large impact on the number-registeredcyclists during certain days could then be best identified. The factors identified wereevents, festivals, football matches, concerts, rainfalls and holidays. Klusteranalys K-Means Cykeldata Maskininlärning Engineering and Technology Teknik och teknologier
26	Algorithmes de classification répartis sur le cloud / Distributed clustering algorithms over a cloud computing platform Durut, Matthieu 28 September 2012 (has links) Les thèmes de recherche abordés dans ce manuscrit ont trait à la parallélisation d’algorithmes de classiﬁcation non-supervisée (clustering) sur des plateformes de Cloud Computing. Le chapitre 2 propose un tour d’horizon de ces technologies. Nous y présentons d’une manière générale le Cloud Computing comme plateforme de calcul. Le chapitre 3 présente l’offre cloud de Microsoft : Windows Azure. Le chapitre suivant analyse certains enjeux techniques de la conception d’applications cloud et propose certains éléments d’architecture logicielle pour de telles applications. Le chapitre 5 propose une analyse du premier algorithme de classiﬁcation étudié : le Batch K-Means. En particulier, nous approfondissons comment les versions réparties de cet algorithme doivent être adaptées à une architecture cloud. Nous y montrons l’impact des coûts de communication sur l’efﬁcacité de cet algorithme lorsque celui-ci est implémenté sur une plateforme cloud. Les chapitres 6 et 7 présentent un travail de parallélisation d’un autre algorithme de classiﬁcation : l’algorithme de Vector Quantization (VQ). Dans le chapitre 6 nous explorons quels schémas de parallélisation sont susceptibles de fournir des résultats satisfaisants en terme d’accélération de la convergence. Le chapitre 7 présente une implémentation de ces schémas de parallélisation. Les détails pratiques de l’implémentation soulignent un résultat de première importance : c’est le caractère en ligne du VQ qui permet de proposer une implémentation asynchrone de l’algorithme réparti, supprimant ainsi une partie des problèmes de communication rencontrés lors de la parallélisation du Batch K-Means. / He subjects addressed in this thesis are inspired from research problems faced by the Lokad company. These problems are related to the challenge of designing efﬁcient parallelization techniques of clustering algorithms on a Cloud Computing platform. Chapter 2 provides an introduction to the Cloud Computing technologies, especially the ones devoted to intensivecomputations. Chapter 3 details more speciﬁcally Microsoft Cloud Computing offer : Windows Azure. The following chapter details technical aspects of cloud application development and provides some cloud design patterns. Chapter 5 is dedicated to the parallelization of a well-known clustering algorithm: the Batch K-Means. It provides insights on the challenges of a cloud implementation of distributed Batch K-Means, especially the impact of communication costs on the implementation efﬁciency. Chapters 6 and 7 are devoted to the parallelization of another clustering algorithm, the Vector Quantization (VQ). Chapter 6 provides an analysis of different parallelization schemes of VQ and presents the various speedups to convergence provided by them. Chapter 7 provides a cloud implementation of these schemes. It highlights that it is the online nature of the VQ technique that enables an asynchronous cloud implementation, which drastically reducesthe communication costs introduced in Chapter 5. Algorithme des k-moyennes Cloud computing K-means clustering Cloud computing
27	Klustringsanalys av driftarbanor i norska havet Brask, Axel, Fageräng, Rasmus January 2023 (has links) No description available. Klustring Specktralklustring K-means Konvexa mängder Oceanografi Driftmätningar Mathematics Matematik
28	Improving Document Clustering by Refining Overlapping Cluster Regions Upadhye, Akshata Rajendra January 2022 (has links) No description available. Information Science document cluster overlapping embeddings purity silhouette K-means
29	Clustering Analysis of Nuclear Proliferation Resistance Measures Jankovsky, Zachary Kyle 02 October 2014 (has links) No description available. Nuclear Engineering
30	Using Hadoop to Cluster Data in Energy System Hou, Jun 03 June 2015 (has links) No description available. Computer Science Hadoop K-means energy data clustering analysis

Search results