Spelling suggestions: "subject:" clusteringalgorithm"" "subject:" klusteringalgorithm""
31 |
Efficient Hierarchical Clustering Techniques For Pattern ClassificationVijaya, P A 07 1900 (has links) (PDF)
No description available.
|
32 |
An Efficient Framework for Processing and Analyzing Unstructured Text to Discover Delivery Delay and Optimization of Route Planning in Realtime / Un framework efficace pour le traitement et l'analyse des textes non structurés afin de découvrir les retards de livraison et d'optimiser la planification de routes en temps réelAlshaer, Mohammad 13 September 2019 (has links)
L'Internet des objets, ou IdO (en anglais Internet of Things, ou IoT) conduit à un changement de paradigme du secteur de la logistique. L'avènement de l'IoT a modifié l'écosystème de la gestion des services logistiques. Les fournisseurs de services logistiques utilisent aujourd'hui des technologies de capteurs telles que le GPS ou la télémétrie pour collecter des données en temps réel pendant la livraison. La collecte en temps réel des données permet aux fournisseurs de services de suivre et de gérer efficacement leur processus d'expédition. Le principal avantage de la collecte de données en temps réel est qu’il permet aux fournisseurs de services logistiques d’agir de manière proactive pour éviter des conséquences telles que des retards de livraison dus à des événements imprévus ou inconnus. De plus, les fournisseurs ont aujourd'hui tendance à utiliser des données provenant de sources externes telles que Twitter, Facebook et Waze, parce que ces sources fournissent des informations critiques sur des événements tels que le trafic, les accidents et les catastrophes naturelles. Les données provenant de ces sources externes enrichissent l'ensemble de données et apportent une valeur ajoutée à l'analyse. De plus, leur collecte en temps réel permet d’utiliser les données pour une analyse en temps réel et de prévenir des résultats inattendus (tels que le délai de livraison, par exemple) au moment de l’exécution. Cependant, les données collectées sont brutes et doivent être traitées pour une analyse efficace. La collecte et le traitement des données en temps réel constituent un énorme défi. La raison principale est que les données proviennent de sources hétérogènes avec une vitesse énorme. La grande vitesse et la variété des données entraînent des défis pour effectuer des opérations de traitement complexes telles que le nettoyage, le filtrage, le traitement de données incorrectes, etc. La diversité des données - structurées, semi-structurées et non structurées - favorise les défis dans le traitement des données à la fois en mode batch et en temps réel. Parce que, différentes techniques peuvent nécessiter des opérations sur différents types de données. Une structure technique permettant de traiter des données hétérogènes est très difficile et n'est pas disponible actuellement. En outre, l'exécution d'opérations de traitement de données en temps réel est très difficile ; des techniques efficaces sont nécessaires pour effectuer les opérations avec des données à haut débit, ce qui ne peut être fait en utilisant des systèmes d'information logistiques conventionnels. Par conséquent, pour exploiter le Big Data dans les processus de services logistiques, une solution efficace pour la collecte et le traitement des données en temps réel et en mode batch est essentielle. Dans cette thèse, nous avons développé et expérimenté deux méthodes pour le traitement des données: SANA et IBRIDIA. SANA est basée sur un classificateur multinomial Naïve Bayes, tandis qu'IBRIDIA s'appuie sur l'algorithme de classification hiérarchique (CLH) de Johnson, qui est une technologie hybride permettant la collecte et le traitement de données par lots et en temps réel. SANA est une solution de service qui traite les données non structurées. Cette méthode sert de système polyvalent pour extraire les événements pertinents, y compris le contexte (tel que le lieu, l'emplacement, l'heure, etc.). En outre, il peut être utilisé pour effectuer une analyse de texte sur les événements ciblés. IBRIDIA a été conçu pour traiter des données inconnues provenant de sources externes et les regrouper en temps réel afin d'acquérir une connaissance / compréhension des données permettant d'extraire des événements pouvant entraîner un retard de livraison. Selon nos expériences, ces deux approches montrent une capacité unique à traiter des données logistiques / Internet of Things (IoT) is leading to a paradigm shift within the logistics industry. The advent of IoT has been changing the logistics service management ecosystem. Logistics services providers today use sensor technologies such as GPS or telemetry to collect data in realtime while the delivery is in progress. The realtime collection of data enables the service providers to track and manage their shipment process efficiently. The key advantage of realtime data collection is that it enables logistics service providers to act proactively to prevent outcomes such as delivery delay caused by unexpected/unknown events. Furthermore, the providers today tend to use data stemming from external sources such as Twitter, Facebook, and Waze. Because, these sources provide critical information about events such as traffic, accidents, and natural disasters. Data from such external sources enrich the dataset and add value in analysis. Besides, collecting them in real-time provides an opportunity to use the data for on-the-fly analysis and prevent unexpected outcomes (e.g., such as delivery delay) at run-time. However, data are collected raw which needs to be processed for effective analysis. Collecting and processing data in real-time is an enormous challenge. The main reason is that data are stemming from heterogeneous sources with a huge speed. The high-speed and data variety fosters challenges to perform complex processing operations such as cleansing, filtering, handling incorrect data, etc. The variety of data – structured, semi-structured, and unstructured – promotes challenges in processing data both in batch-style and real-time. Different types of data may require performing operations in different techniques. A technical framework that enables the processing of heterogeneous data is heavily challenging and not currently available. In addition, performing data processing operations in real-time is heavily challenging; efficient techniques are required to carry out the operations with high-speed data, which cannot be done using conventional logistics information systems. Therefore, in order to exploit Big Data in logistics service processes, an efficient solution for collecting and processing data in both realtime and batch style is critically important. In this thesis, we developed and experimented with two data processing solutions: SANA and IBRIDIA. SANA is built on Multinomial Naïve Bayes classifier whereas IBRIDIA relies on Johnson's hierarchical clustering (HCL) algorithm which is hybrid technology that enables data collection and processing in batch style and realtime. SANA is a service-based solution which deals with unstructured data. It serves as a multi-purpose system to extract the relevant events including the context of the event (such as place, location, time, etc.). In addition, it can be used to perform text analysis over the targeted events. IBRIDIA was designed to process unknown data stemming from external sources and cluster them on-the-fly in order to gain knowledge/understanding of data which assists in extracting events that may lead to delivery delay. According to our experiments, both of these approaches show a unique ability to process logistics data. However, SANA is found more promising since the underlying technology (Naïve Bayes classifier) out-performed IBRIDIA from performance measuring perspectives. It is clearly said that SANA was meant to generate a graph knowledge from the events collected immediately in realtime without any need to wait, thus reaching maximum benefit from these events. Whereas, IBRIDIA has an important influence within the logistics domain for identifying the most influential category of events that are affecting the delivery. Unfortunately, in IBRIRDIA, we should wait for a minimum number of events to arrive and always we have a cold start. Due to the fact that we are interested in re-optimizing the route on the fly, we adopted SANA as our data processing framework
|
33 |
MP-Draughts - Um Sistema Multiagente de Aprendizagem Automática para Damas Baseado em Redes Neurais de Kohonen e Perceptron MulticamadasDuarte, Valquíria Aparecida Rosa 17 July 2009 (has links)
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior / The goal of this work is to present MP-Draughts (MultiPhase- Draughts), that is
a multiagent environment for Draughts, where one agent - named IIGA- is built and
trained such as to be specialized for the initial and the intermediate phases of the games
and the remaining ones for the final phases of them. Each agent of MP-Draughts is a
neural network which learns almost without human supervision (distinctly from the world
champion agent Chinook). MP-Draughts issues from a continuous activity of research
whose previous product was the efficient agent VisionDraughts. Despite its good general
performance, VisionDraughts frequently does not succeed in final phases of a game, even
being in advantageous situation compared to its opponent (for instance, getting into
endgame loops). In order to try to reduce this misbehavior of the agent during endgames,
MP-Draughts counts on 25 agents specialized for endgame phases, each one trained such
as to be able to deal with a determined cluster of endgame boardstates. These 25 clusters
are mined by a Kohonen-SOM Network from a Data Base containing a large quantity of
endgame boardstates. After trained, MP-Draughts operates in the following way: first,
an optimized version of VisionDraughts is used as IIGA; next, the endgame agent that
represents the cluster which better fits the current endgame board-state will replace it up
to the end of the game. This work shows that such a strategy significantly improves the
general performance of the player agents. / O objetivo deste trabalho é propor um sistema de aprendizagem de Damas, o MPDraughts
(MultiPhase- Draughts): um sistema multiagentes, em que um deles - conhecido
como IIGA (Initial/Intermediate Game Agent)- é desenvolvido e treinado para ser especializado
em fases iniciais e intermediárias de jogo e os outros 25 agentes, em fases finais.
Cada um dos agentes que compõe o MP-Draughts é uma rede neural que aprende a jogar
com o mínimo possível de intervenção humana (distintamente do agente campeão do
mundo Chinook). O MP-Draughts é fruto de uma contínua atividade de pesquisa que
teve como produto anterior o VisionDraughts. Apesar de sua eficiência geral, o Vision-
Draughts, muitas vezes, tem seu bom desempenho comprometido na fase de finalização
de partidas, mesmo estando em vantagem no jogo em comparação com o seu oponente
(por exemplo, entrando em loop de final de jogo). No sentido de reduzir o comportamento
indesejado do jogador, o MP-Draughts conta com 25 agentes especializados em final de
jogo, sendo que cada um é treinado para lidar com um determinado tipo de cluster de
tabuleiros de final de jogo. Esses 25 clusters são minerados por redes de Kohonen-SOM
de uma base de dados que contém uma grande quantidade de estado de tabuleiro de final
de jogo. Depois de treinado, o MP-Draughts atua da seguinte maneira: primeiro, uma
versão aprimorada do VisionDraughts é usada como o IIGA; depois, um agente de final
de jogo que representa o cluster que mais se aproxima do estado corrente do tabuleiro do
jogo deverá substituir o IIGA e conduzir o jogo até o final. Este trabalho mostra que essa
estratégia melhorou, significativamente, o desempenho geral do agente jogador. / Mestre em Ciência da Computação
|
34 |
Raisonnement approximatif pour la détection et l'analyse de changements / Approximate reasoning for the detection and analysing of changesHaouas, Fatma 25 September 2019 (has links)
Cette thèse est le fruit de l’interaction de deux disciplines qui sont la détection de changements dans des images multitemporelles et le raisonnement évidentiel à l’aide de la théorie de Dempster-Shafer (DST). Aborder le problème de détection et d’analyse de changements par la DST nécessite la détermination d’un cadre de discernement exhaustif et exclusif. Ce problème s’avère complexe en l’absence des informations a priori sur les images. Nous proposons dans ce travail de recherche un nouvel algorithme de clustering basé sur l’algorithme Fuzzy-C-Means (FCM) afin de définir les classes sémantiques existantes. L’idée de cet algorithme est la représentation de chaque classe par un nombre varié de centroïdes afin de garantir une meilleure caractérisation de classes. Afin d’assurer l’exhaustivité du cadre de discernement, un nouvel indice de validité de clustering permettant de déterminer le nombre optimal de classes sémantiques est proposé. La troisième contribution consiste à exploiter la position du pixel par rapport aux centroïdes des classes et les degrés d’appartenance afin de définir la distribution de masse qui représente les informations. La particularité de la distribution proposée est la génération d’un nombre réduit des éléments focaux et le respect des axiomes mathématiques en effectuant la transformation flou-masse. Nous avons souligné la capacité du conflit évidentiel à indiquer les transformations multi-temporelles. Nous avons porté notre raisonnement sur la décomposition du conflit global et l’estimation des conflits partiels entre les couples des éléments focaux pour mesurer le conflit causé par le changement. Cette stratégie permet d’identifier le couple de classes qui participent dans le changement. Pour quantifier ce conflit, nous avons proposé une nouvelle mesure de changement notée CM. Finalement, nous avons proposé un algorithme permettant de déduire la carte binaire de changements à partir de la carte de conflits partiels. / This thesis is the interaction result of two disciplines that are the change detection in multitemporal images and the evidential reasoning using the Dempster-Shafer theory (DST). Addressing the problem of change detection and analyzing by the DST, requires the determination of an exhaustive and exclusive frame of discernment. This issue is complex when images lake prior information. In this research work, we propose a new clustering algorithm based on the Fuzzy-C-Means (FCM) algorithm in order to define existing semantic classes. The idea of this algorithm is the representation of each class by a varied number of centroids in order to guarantee a better characterization of classes. To ensure the frame of discernment exhaustiveness, we proposed a new cluster validity index able to identify the optimal number of semantic classes. The third contribution is to exploit the position of the pixel in relation to class centroids and its membership distribution in order to define the mass distribution that represents information. The particularity of the proposed distribution, is the generation of a reduced set of focal elements and the respect of mathematical axioms when performing the fuzzy-mass transformation. We have emphasized the capacity of evidential conflict to indicate multi-temporal transformations. We reasoned on the decomposition of the global conflict and the estimation of the partial conflicts between the couples of focal elements to measure the conflict caused by the change. This strategy allows to identify the couple of classes that participate in the change. To quantify this conflict, we proposed a new measure of change noted CM. Finally, we proposed an algorithm to deduce the binary map of changes from the partial conflicts map.
|
35 |
Scalable Parallel Machine Learning on High Performance Computing Systems–Clustering and Reinforcement LearningWeijian Zheng (14226626) 08 December 2022 (has links)
<p>High-performance computing (HPC) and machine learning (ML) have been widely adopted by both academia and industries to address enormous data problems at extreme scales. While research has reported on the interactions of HPC and ML, achieving high performance and scalability for parallel and distributed ML algorithms is still a challenging task. This dissertation first summarizes the major challenges for applying HPC to ML applications: 1) poor performance and scalability, 2) loss of the convergence rate, 3) lower quality of the trained model, and 4) a lack of performance optimization techniques designed for specific applications. Researchers can address the four challenges in new ML applications. This dissertation shows how to solve them for two specific applications: 1) a clustering algorithm and 2) graph optimization algorithms that use reinforcement learning (RL).</p>
<p>As to the clustering algorithm, we first propose an algorithm called the simulated-annealing clustering algorithm. By combining a blocked data layout and asynchronous local optimization within each thread, the simulated-annealing enhanced clustering algorithm has a convergence rate that is comparable to the K-means algorithm but with much higher performance. Experiments with synthetic and real-world datasets show that the simulated-annealing enhanced clustering algorithm is significantly faster than the MPI K-means library using up to 1024 cores. However, the optimization costs (Sum of Square Error (SSE)) of the simulated-annealing enhanced clustering algorithm became higher than the original costs. To tackle this problem, we devise a new algorithm called the full-step feel-the-way clustering algorithm. In the full-step feel-the-way algorithm, there are L local steps within each block of data points. We use the first local step’s results to compute accurate global optimization costs. Our results show that the full-step algorithm can significantly reduce the global number of iterations needed to converge while obtaining low SSE costs. However, the time spent on the local steps is greater than the benefits of the saved iterations. To improve this performance, we next optimize the local step time by incorporating a sampling-based method called reassignment-history-aware sampling. Extensive experiments with various synthetic and real world datasets (e.g., MNIST, CIFAR-10, ENRON, and PLACES-2) show that our parallel algorithms can outperform the fastest open-source MPI K-means implementation by up to 110% on 4,096 CPU cores with comparable SSE costs.</p>
<p>Our evaluations of the sampling-based feel-the-way algorithm establish the effectiveness of the local optimization strategy, the blocked data layout, and the sampling methods for addressing the challenges of applying HPC to ML applications. To explore more parallel strategies and optimization techniques, we focus on a more complex application: graph optimization problems using reinforcement learning (RL). RL has proved successful for automatically learning good heuristics to solve graph optimization problems. However, the existing RL systems either do not support graph RL environments or do not support multiple or many GPUs in a distributed setting. This has compromised RL’s ability to solve large scale graph optimization problems due to the lack of parallelization and high scalability. To address the challenges of parallelization and scalability, we develop OpenGraphGym-MG, a high performance distributed-GPU RL framework for solving graph optimization problems. OpenGraphGym-MG focuses on a class of computationally demanding RL problems in which both the RL environment and the policy model are highly computation intensive. In this work, we distribute large-scale graphs across distributed GPUs and use spatial parallelism and data parallelism to achieve scalable performance. We compare and analyze the performance of spatial and data parallelism and highlight their differences. To support graph neural network (GNN) layers that take data samples partitioned across distributed GPUs as input, we design new parallel mathematical kernels to perform operations on distributed 3D sparse and 3D dense tensors. To handle costly RL environments, we design new parallel graph environments to scale up all RL-environment-related operations. By combining the scalable GNN layers with the scalable RL environment, we are able to develop high performance OpenGraphGym-MG training and inference algorithms in parallel.</p>
<p>To summarize, after proposing the major challenges for applying HPC to ML applications, this thesis explores several parallel strategies and performance optimization techniques using two ML applications. Specifically, we propose a local optimization strategy, a blocked data layout, and sampling methods for accelerating the clustering algorithm, and we create a spatial parallelism strategy, a parallel graph environment, agent, and policy model, and an optimized replay buffer, and multi-node selection strategy for solving large optimization problems over graphs. Our evaluations prove the effectiveness of these strategies and demonstrate that our accelerations can significantly outperform the state-of-the-art ML libraries and frameworks without loss of quality in trained models.</p>
|
Page generated in 0.0663 seconds