Spelling suggestions: "subject:"4cluster densemble"" "subject:"4cluster dfensemble""
1 |
How to Control Clustering Results?Hahmann, Martin, Volk, Peter B., Rosenthal, Frank, Habich, Dirk, Lehner, Wolfgang 19 January 2023 (has links)
One of the most important and challenging questions in the area of clustering is how to choose the best-fitting algorithm and parameterization to obtain an optimal clustering for the considered data. The clustering aggregation concept tries to bypass this problem by generating a set of separate, heterogeneous partitionings of the same data set, from which an aggregate clustering is derived. As of now, almost every existing aggregation approach combines given crisp clusterings on the basis of pair-wise similarities. In this paper, we regard an input set of soft clusterings and show that it contains additional information that is efficiently useable for the aggregation. Our approach introduces an expansion of mentioned pair-wise similarities, allowing control and adjustment of the aggregation process and its result. Our experiments show that our flexible approach offers adaptive results, improved identification of structures and high useability.
|
2 |
Extracting Customer Sentiments from Email Support Tickets : A case for email support ticket prioritisationFiati-Kumasenu, Albert January 2019 (has links)
Background Daily, companies generate enormous amounts of customer support tickets which are grouped and placed in specialised queues, based on some characteristics, from where they are resolved by the customer support personnel (CSP) on a first-in-first-out basis. Given that these tickets require different levels of urgency, a logical next step to improving the effectiveness of the CSPs is to prioritise the tickets based on business policies. Among the several heuristics that can be used in prioritising tickets is sentiment polarity. Objectives This study investigates how machine learning methods and natural language techniques can be leveraged to automatically predict the sentiment polarity of customer support tickets using. Methods Using a formal experiment, the study examines how well Support Vector Machine (SVM), Naive Bayes (NB) and Logistic Regression (LR) based sentiment polarity prediction models built for the product and movie reviews, can be used to make sentiment predictions on email support tickets. Due to the limited size of annotated email support tickets, Valence Aware Dictionary and sEntiment Reasoner (VADER) and cluster ensemble - using k-means, affinity propagation and spectral clustering, is investigated for making sentiment polarity prediction. Results Compared to NB and LR, SVM performs better, scoring an average f1-score of .71 whereas NB scores least with a .62 f1-score. SVM, combined with the presence vector, outperformed the frequency and TF-IDF vectors with an f1-score of .73 while NB records an f1-score of .63. Given an average f1-score of .23, the models transferred from the movie and product reviews performed inadequately even when compared with a dummy classifier with an f1-score average of .55. Finally, the cluster ensemble method outperformed VADER with an f1-score of .61 and .53 respectively. Conclusions Given the results, SVM, combined with a presence vector of bigrams and trigrams is a candidate solution for extracting sentiments from email support tickets. Additionally, transferring sentiment models from the movie and product reviews domain to the email support tickets is not possible. Finally, given that there exists a limited dataset for conducting sentiment analysis studies in the Swedish and the customer support context, a cluster ensemble is recommended as a sample selection method for generating annotated data.
|
3 |
Ensemble de agrupamentos para sistemas de recomendação baseados em conteúdo / Cluster ensemble to content-based recommender systemsCosta, Fernando Henrique da Silva 05 November 2018 (has links)
O crescimento acelerado da internet proporcionou uma quantidade grande de informações acessíveis aos usuários. Ainda que tal quantidade possua algumas vantagens, os usuários que possuem pouca ou nenhuma experiência para escolher uma alternativa dentre as várias apresentadas terão dificuldades em encontrar informações (ou itens, considerando o escopo deste trabalho) úteis e que atendam às suas necessidades. Devido a esse contexto, os sistemas de recomendação foram desenvolvidos para auxiliar os usuários a encontrar itens relevantes e personalizados. Tais sistemas são divididos em diversas arquiteturas. Como exemplo estão as arquiteturas baseadas em: conteúdo, filtro colaborativo e conhecimento. Para este trabalho, a primeira arquitetura foi explorada. A arquitetura baseada em conteúdo recomenda itens ao usuário com base na similaridade desses aos itens que o usuário mostrou interesse no passado. Por consequência, essa arquitetura possui a limitação de, geralmente, realizar recomendações com baixa serendipidade, uma vez que os itens recomendados tendem a ser semelhantes àqueles observados pelo o usuário e, portanto, não apresentam novidade ou surpresa. Diante desta limitação, o aspecto de serendipidade tem destaque nas discussões apresentadas neste trabalho. Assim, o objetivo deste trabalho é minimizar o problema da baixa serendipidade das recomendações por meio da utilização da análise de similaridades parciais implementada usando ensemble de agrupamentos. Para alcançar este objetivo, estratégias de recomendação baseadas em conteúdo implementadas usando agrupamento e ensemble de agrupamento foram propostas e avaliadas neste trabalho. A avaliação contou com análises qualitativas sobre as recomendações produzidas e com um estudo com usuários. Nesse estudo, quatro estratégias de recomendação de notícias foram avaliadas, incluindo as duas propostas neste trabalhos, uma estratégia baseada em recomendação aleatória, e uma estratégia baseada em coagrupamento. As avaliações consideraram aspectos de relevância, surpresa e serendipidade de recomendações. Esse último aspecto é descrito como itens que apresentam tanto surpresa quanto relevância ao usuário. Os resultados de ambas análises mostraram a viabilidade da utilização de agrupamento como base de recomendação, uma vez que o ensemble de agrupamentos obteve resultados satisfatórios em todos os aspectos, principalmente em surpresa, enquanto a estratégia baseada em agrupamento simples obteve os melhores resultados em relevância e serendipidade / The accelerated growth of the internet has provided a large amount of information accessible to users. Although this amount of information has some advantages, users who have little or no experience in choosing one of several alternatives will find it difficulty to find useful information (or items, considering the scope of this work) that meets their needs. Due to this context, recommender systems have been developed to help users find relevant and personalized items. Such systems are divided into several architectures as content-based, collaborative filtering and knowledge-based. The first architecture was explored in this work. The content-based architecture recommends items to the user based on their similarity to items that the user has shown interest in the past. Consequently, this architecture has the limitation of generally making recommendations with low serendipity, since the recommended items tend to be similar to those observed by the user and, therefore, do not present novelty or surprise. Given this limitation, the aspect of serendipity is highlighted in the discussions presented in this work. Thus, the objective of this work is to minimize the problem of the low serendipity of the recommendations through the use of the partial similarity analysis implemented using cluster ensemble. To achieve this goal, content-based recommendation strategies implemented using clustering and cluster ensemble were proposed and evaluated. The evaluation involved qualitative analysis of the recommendations and a study with users. In such a study, four news recommendation strategies were evaluated including the two strategies proposed in this work, a strategy based on random recommendation, and a strategy based on co-clustering. The evaluations considered aspects of relevance, surprise and serendipity of recommendations. This last aspect is described as items that present both surprise and relevance to the user. The results of both analyzes showed the feasibility of using clustering as the basis of recommendation, since cluster ensemble had satisfactory results in all aspects, mainly in surprise, whereas the simple clustering-based strategy obtained the best results in relevance and serendipity
|
4 |
Ensemble de agrupamentos para sistemas de recomendação baseados em conteúdo / Cluster ensemble to content-based recommender systemsFernando Henrique da Silva Costa 05 November 2018 (has links)
O crescimento acelerado da internet proporcionou uma quantidade grande de informações acessíveis aos usuários. Ainda que tal quantidade possua algumas vantagens, os usuários que possuem pouca ou nenhuma experiência para escolher uma alternativa dentre as várias apresentadas terão dificuldades em encontrar informações (ou itens, considerando o escopo deste trabalho) úteis e que atendam às suas necessidades. Devido a esse contexto, os sistemas de recomendação foram desenvolvidos para auxiliar os usuários a encontrar itens relevantes e personalizados. Tais sistemas são divididos em diversas arquiteturas. Como exemplo estão as arquiteturas baseadas em: conteúdo, filtro colaborativo e conhecimento. Para este trabalho, a primeira arquitetura foi explorada. A arquitetura baseada em conteúdo recomenda itens ao usuário com base na similaridade desses aos itens que o usuário mostrou interesse no passado. Por consequência, essa arquitetura possui a limitação de, geralmente, realizar recomendações com baixa serendipidade, uma vez que os itens recomendados tendem a ser semelhantes àqueles observados pelo o usuário e, portanto, não apresentam novidade ou surpresa. Diante desta limitação, o aspecto de serendipidade tem destaque nas discussões apresentadas neste trabalho. Assim, o objetivo deste trabalho é minimizar o problema da baixa serendipidade das recomendações por meio da utilização da análise de similaridades parciais implementada usando ensemble de agrupamentos. Para alcançar este objetivo, estratégias de recomendação baseadas em conteúdo implementadas usando agrupamento e ensemble de agrupamento foram propostas e avaliadas neste trabalho. A avaliação contou com análises qualitativas sobre as recomendações produzidas e com um estudo com usuários. Nesse estudo, quatro estratégias de recomendação de notícias foram avaliadas, incluindo as duas propostas neste trabalhos, uma estratégia baseada em recomendação aleatória, e uma estratégia baseada em coagrupamento. As avaliações consideraram aspectos de relevância, surpresa e serendipidade de recomendações. Esse último aspecto é descrito como itens que apresentam tanto surpresa quanto relevância ao usuário. Os resultados de ambas análises mostraram a viabilidade da utilização de agrupamento como base de recomendação, uma vez que o ensemble de agrupamentos obteve resultados satisfatórios em todos os aspectos, principalmente em surpresa, enquanto a estratégia baseada em agrupamento simples obteve os melhores resultados em relevância e serendipidade / The accelerated growth of the internet has provided a large amount of information accessible to users. Although this amount of information has some advantages, users who have little or no experience in choosing one of several alternatives will find it difficulty to find useful information (or items, considering the scope of this work) that meets their needs. Due to this context, recommender systems have been developed to help users find relevant and personalized items. Such systems are divided into several architectures as content-based, collaborative filtering and knowledge-based. The first architecture was explored in this work. The content-based architecture recommends items to the user based on their similarity to items that the user has shown interest in the past. Consequently, this architecture has the limitation of generally making recommendations with low serendipity, since the recommended items tend to be similar to those observed by the user and, therefore, do not present novelty or surprise. Given this limitation, the aspect of serendipity is highlighted in the discussions presented in this work. Thus, the objective of this work is to minimize the problem of the low serendipity of the recommendations through the use of the partial similarity analysis implemented using cluster ensemble. To achieve this goal, content-based recommendation strategies implemented using clustering and cluster ensemble were proposed and evaluated. The evaluation involved qualitative analysis of the recommendations and a study with users. In such a study, four news recommendation strategies were evaluated including the two strategies proposed in this work, a strategy based on random recommendation, and a strategy based on co-clustering. The evaluations considered aspects of relevance, surprise and serendipity of recommendations. This last aspect is described as items that present both surprise and relevance to the user. The results of both analyzes showed the feasibility of using clustering as the basis of recommendation, since cluster ensemble had satisfactory results in all aspects, mainly in surprise, whereas the simple clustering-based strategy obtained the best results in relevance and serendipity
|
5 |
Design, Implementation and Analysis of a Description Model for Complex Archaeological Objects / Elaboration, mise en œuvre et analyse d’un mod`ele de description d’objets arch´eologiques complexesOzturk, Aybuke 09 July 2018 (has links)
La céramique est l'un des matériaux archéologiques les plus importants pour aider à la reconstruction des civilisations passées. Les informations à propos des objets céramiques complexes incluent des données textuelles, numériques et multimédias qui posent plusieurs défis de recherche abordés dans cette thèse. D'un point de vue technique, les bases de données de céramiques présentent différents formats de fichiers, protocoles d'accès et langages d'interrogation. Du point de vue des données, il existe une grande hétérogénéité et les experts ont différentes façons de représenter et de stocker les données. Il n'existe pas de contenu et de terminologie standard, surtout en ce qui concerne la description des céramiques. De plus, la navigation et l'observation des données sont difficiles. L'intégration des données est également complexe en raison de laprésence de différentes dimensions provenant de bases de données distantes, qui décrivent les mêmes catégories d'objets de manières différentes.En conséquence, ce projet de thèse vise à apporter aux archéologues et aux archéomètres des outils qui leur permettent d'enrichir leurs connaissances en combinant différentes informations sur les céramiques. Nous divisons notre travail en deux parties complémentaires : (1) Modélisation de données archéologiques complexes, et (2) Partitionnement de données (clustering) archéologiques complexes. La première partie de cette thèse est consacrée à la conception d'un modèle de données archéologiques complexes pour le stockage des données céramiques. Cette base de donnée alimente également un entrepôt de données permettant des analyses en ligne (OLAP). La deuxième partie de la thèse est consacrée au clustering (catégorisation) des objets céramiques. Pour ce faire, nous proposons une approche floue, dans laquelle un objet céramique peut appartenir à plus d'un cluster (d'une catégorie). Ce type d'approche convient bien à la collaboration avec des experts, enouvrant de nouvelles discussions basées sur les résultats du clustering.Nous contribuons au clustering flou (fuzzy clustering) au sein de trois sous-tâches : (i) une nouvelle méthode d'initialisation des clusters flous qui maintient linéaire la complexité de l'approche ; (ii) un indice de qualité innovant qui permet de trouver le nombre optimal de clusters ; et (iii) l'approche Multiple Clustering Analysis qui établit des liens intelligents entre les données visuelles, textuelles et numériques, ce qui permet de combiner tous les types d'informations sur les céramiques. Par ailleurs, les méthodes que nous proposons pourraient également être adaptées à d'autres domaines d'application tels que l'économie ou la médecine. / Ceramics are one of the most important archaeological materials to help in the reconstruction of past civilizations. Information about complex ceramic objects is composed of textual, numerical and multimedia data, which induce several research challenges addressed in this thesis. From a technical perspective, ceramic databases have different file formats, access protocols and query languages. From a data perspective, ceramic data are heterogeneous and experts have differentways of representing and storing data. There is no standardized content and terminology, especially in terms of description of ceramics. Moreover, data navigation and observation are difficult. Data integration is also difficult due to the presence of various dimensions from distant databases, which describe the same categories of objects in different ways.Therefore, the research project presented in this thesis aims to provide archaeologists and archaeological scientists with tools for enriching their knowledge by combining different information on ceramics. We divide our work into two complementary parts: (1) Modeling of Complex Archaeological Data and (2) Clustering Analysis of Complex Archaeological Data. The first part of this thesis is dedicated to the design of a complex archaeological database model for the storage of ceramic data. This database is also used to source a data warehouse for doing online analytical processing (OLAP). The second part of the thesis is dedicated to an in-depth clustering (categorization) analysis of ceramic objects. To do this, we propose a fuzzy approach, where ceramic objects may belong to more than one cluster (category). Such a fuzzy approach is well suited for collaborating with experts, by opening new discussions based on clustering results.We contribute to fuzzy clustering in three sub-tasks: (i) a novel fuzzy clustering initialization method that keeps the fuzzy approach linear; (ii) an innovative quality index that allows finding the optimal number of clusters; and (iii) the Multiple Clustering Analysis approach that builds smart links between visual, textual and numerical data, which assists in combining all types ofceramic information. Moreover, the methods we propose could also be adapted to other application domains such as economy or medicine.
|
Page generated in 0.1679 seconds