Global ETD Search

11	Predictive maintenance using NLP and clustering support messages Yilmaz, Ugur January 2022 (has links) Communication with customers is a major part of customer experience as well as a great source of data mining. More businesses are engaging with consumers via text messages. Before 2020, 39% of businesses already use some form of text messaging to communicate with their consumers. Many more were expected to adopt the technology after 2020[1]. Email response rates are merely 8%, compared to a response rate of 45% for text messaging[2]. A significant portion of this communication involves customer enquiries or support messages sent in both directions. According to estimates, more than 80% of today’s data is stored in an unstructured format (suchas text, image, audio, or video) [3], with a significant portion of it being stated in ambiguous natural language. When analyzing such data, qualitative data analysis techniques are usually employed. In order to facilitate the automated examination of huge corpora of textual material, researchers have turned to natural language processing techniques[4]. Under the light of shared statistics above, Billogram[5] has decided that support messages between creditors and recipients can be mined for predictive maintenance purposes, such as early identification of an outlier like a bug, defect, or wrongly built feature. As one sentence goal definition, Billogram is looking for an answer to ”why are people reaching out to begin with?” This thesis project discusses implementing unsupervised clustering of support messages by benefiting from natural language processing methods as well as performance metrics of results to answer Billogram’s question. The research also contains intent recognition of clustered messages in two different ways, one automatic and one semi-manual, the results have been discussed and compared. LDA and manual intent assignment approach of the first research has 100 topics and a 0.293 coherence score. On the other hand, the second approach produced 158 clusters with UMAP and HDBSCAN while intent recognition was automatic. Creating clusters will help identifying issues which can be subjects of increased focus, automation, or even down-prioritizing. Therefore, this research lands in the predictive maintenance[9] area. This study, which will get better over time with more iterations in the company, also contains the preliminary work for ”labeling” or ”describing”clusters and their intents. Predictive maintenance support messages NLP unsupervised clustering intent recognition LDA UMAP HDBSCAN BERT Swedish BERT(KB-BERT) Billogram
12	Περίληψη βίντεο με μη επιβλεπόμενες τεχνικές ομαδοποίησης Μπεσύρης, Δημήτριος 11 October 2013 (has links) Η ραγδαία ανάπτυξη που παρουσιάστηκε τα τελευταία χρόνια σε διάφορους τομείς της πληροφορικής με την αύξηση της ισχύος επεξεργασίας και της δυνατότητας αποθήκευσης ενός τεράστιου όγκου δεδομένων έδωσε νέα ώθηση στον τομέα διαχείρισης, αναζήτησης, σύνοψης και εξαγωγής της πληροφορίας από ένα βίντεο. Για την διαχείριση αυτής της πληροφορίας αναπτύχθηκαν τεχνικές περίληψης βίντεο. Η περίληψη ενός βίντεο υπό μορφή μιας στατικής ακολουθίας χαρακτηριστικών καρέ, μειώνει τον απαραίτητο όγκο της πληροφορίας που απαιτείται σε συστήματα αναζήτησης, ενώ διαμορφώνει την βάση για την αντιμετώπιση του σημασιολογικού περιεχομένου του σε εφαρμογές ανάκτησης. Το ερευνητικό αντικείμενο της παρούσας διδακτορικής διατριβής αναφέρεται σε τεχνικές αυτόματης περίληψης βίντεο με χρήση της θεωρίας γράφων, για την ανάπτυξη μη επιβλεπόμενων αλγόριθμων ομαδοποίησης. Κάθε καρέ της ακολουθίας του βίντεο δεν αντιμετωπίζεται ως ένα διακριτό στοιχείο, αλλά λαμβάνεται υπόψη ο βαθμός συσχέτισης μεταξύ τους. Με αυτόν τον τρόπο το πρόβλημα της ομαδοποίησης ανάγεται από μια τυπική διαδικασία αναγνώρισης ομάδων σε ένα σύστημα ανάλυσης της δομής που περιέχεται στο σύνολο των δεδομένων. Ακόμη παρουσιάζεται μια νέα τεχνική βελτίωσης του βαθμού ομοιότητας των καρέ, η οποία βασίζεται στο θεωρητικό φορμαλισμό τεχνικών ημί-επιβλεπόμενης εκμάθησης, με χρήση όμως αλγόριθμων δυναμικής συμπίεσης, για την αναπαράσταση του οπτικού περιεχομένου τους. Τα αναλυτικά πειραματικά αποτελέσματα που παρατίθενται, αποδεικνύουν την βελτίωση της απόδοσης των προτεινόμενων μεθόδων σε σχέση με γνωστές τεχνικές περίληψης. Τέλος, προτείνονται κάποιες μελλοντικές κατευθύνσεις έρευνας στο αντικείμενο που πραγματεύεται η παρούσα διατριβή, με άμεσες επεκτάσεις στο πεδίο ανάκτησης εικόνας και βίντεο. / The rapid development witnessed in the recent years enabling the storage and processing of a huge amount of data, in various fields of computer technology and image/video understanding, has given new impetus to the field of video manipulation, browsing, indexing, and retrieval. Video summarization, as a static sequence of key frames, reduces the amount of information required for video searching, while provides the basis for understanding the semantic content in video retrieval applications. The research subject of this doctoral thesis is the incorporation of graph theory and unsupervised clustering algorithms in Automatic Video Summarization applications of large video sequences. In this context, every frame from a video sequence is not processed as a discrete element, but the relations between the frames are considered. Thus, the clustering problem is transformed from a typical computation procedure, to the problem of data structure analysis. Detailed experimental results demonstrate the performance improvement provided by the proposed methods in comparison with well-known video summarization techniques from the literature. Finally, future research directions are proposed, directly applicable to the fields of image and video retrieval. Περίληψη βίντεο Θεωρία γράφων Θεωρία πολυσυνόλων Ασαφής ομαδοποίηση Συμπίεση δεδομένων Ανάλυση ακολουθίας 621.367 Video summarization Graph theory Multiset theory Unsupervised clustering Fuzzy clustering Data compression Sequence analysis
13	Categorization of Customer Reviews Using Natural Language Processing / Kategorisering av kundrecensioner med naturlig språkbehandling Liliemark, Adam, Enghed, Viktor January 2021 (has links) Databases of user generated data can quickly become unmanageable. Klarna faced this issue, with a database of around 700,000 customer reviews. Ideally, the database would be cleaned of uninteresting reviews and the remaining reviews categorized. Without knowing what categories might emerge, the idea was to use an unsupervised clustering algorithm to ﬁnd categories. This thesis describes the work carried out to solve this problem, and proposes a solution for Klarna that involves artiﬁcial neural networks rather than unsupervised clustering. The implementation done by us is able to categorize reviews as either interesting or uninteresting. We propose a workﬂow that would create means to categorize reviews not only in these two categories, but in multiple. The method revolved around experimentation with clustering algorithms and neural networks. Previous research shows that texts can be clustered, however, the datasets used seem to be vastly diﬀerent from the Klarna dataset. The Klarna dataset consists of short reviews and contain a large amount of uninteresting reviews. Using unsupervised clustering yielded unsatisfactory results, as no discernible categories could be found. In some cases, the technique created clusters of uninteresting reviews. These clusters were used as training data for an artiﬁcial neural network, together with manually labeled interesting reviews. The results from this artiﬁcial neural network was satisfactory; it can with an accuracy of around 86% say whether a review is interesting or not. This was achieved using the aforementioned clusters and ﬁve feedback loops, where the model’s wrongfully predicted reviews from an evaluation dataset was fed back to it as training data. We argue that the main reason behind why unsupervised clustering failed is that the length of the reviews are too short. In comparison, other researchers have successfully clustered text data with an average length in the hundreds. These items pack much more features than the short reviews in the Klarna dataset. We show that an artiﬁcial neural network is able to detect these features despite the short length, through its intrinsic design. Further research in feature extraction of short text strings could provide means to cluster this kind of data. If features can be extracted, the clustering can thus be done on the features rather than the actual words. Our artiﬁcial neural network shows that the arbitrary features interesting and uninteresting can be extracted, so we are hopeful that future researchers will ﬁnd ways of extracting more features from short text strings. In theory, this should mean that text of all lengths can be clustered unsupervised. / Databaser med användargenererad data kan snabbt bli ohanterbara. Klarna stod inför detta problem, med en databas innehållande cirka 700 000 recensioner från kunder. De såg helst att databasen skulle rensas från ointressanta recensioner och att de kvarvarande kategoriseras. Eftersom att kategorierna var okända initialt, var tanken att använda en oövervakad grupperingsalgoritm. Denna rapport beskriver det arbete som utfördes för att lösa detta problem, och föreslår en lösning till Klarna som involverar artiﬁciella neurala nätverk istället för oövervakad gruppering. Implementationen skapad av oss är kapabel till att kategorisera recensioner som intressanta eller ointressanta. Vi föreslår ett arbetsﬂöde som skulle skapa möjlighet att kategorisera recensioner inte bara i dessa två kategorier, utan i ﬂera. Metoden kretsar kring experimentering med grupperingsalgoritmer och artiﬁciella neurala nätverk. Tidigare forskning visar att texter kan grupperas oövervakat, dock med ingångsdata som väsentligt skiljer sig från Klarnas data. Recensionerna i Klarnas data är generellt sett korta och en stor andel av dem kan ses som ointressanta. Oövervakad grupperingen gav otillräckliga resultat, då inga skönjbara kategorier stod att ﬁnna. I vissa fall skapades grupperingar av ointressanta recensioner. Dessa användes som träningsdata för ett artiﬁciellt neuralt nätverk. Till träningsdatan lades intressanta recensioner som tagits fram manuellt. Resultaten från detta var positivt; med en träﬀsäkerhet om cirka 86% avgörs om en recension är intressant eller inte. Detta uppnåddes genom den tidigare skapade träningsdatan samt fem återkopplingsprocesser, där modellens felaktiga prediktioner av evalueringsdata matades in som träningsdata. Vår uppfattning är att den korta längden på recensionerna gör att den oövervakade grupperingen inte fungerar. Andra forskare har lyckats gruppera textdata med snittlängder om hundratals ord per text. Dessa texter rymmer ﬂer meningsfulla enheter än de korta recensionerna i Klarnas data. Det ﬁnns lösningar som innefattar artiﬁciella neurala nätverk å andra sidan kan upptäcka dessa meningsfulla enheter, tack vare sin grundläggande utformning. Vårt arbete visar att ett artiﬁciellt neuralt nätverk kan upptäcka dessa meningsfulla enheter, trots den korta längden per recension. Extrahering av meningsfulla enheter ur korta texter är ett ¨ämne som behöver mer forskning för att underlätta problem som detta. Om meningsfulla enheter kan extraheras ur texter, kan grupperingen göras på dessa enheter istället för orden i sig. Vårt artiﬁciella neurala nätverk visar att de arbiträra enheterna intressant och ointressant kan extraheras, vilket gör oss hoppfulla om att framtida forskare kan ﬁnna sätt att extrahera ﬂer enheter ur korta texter. I teorin innebär detta att texter av alla längder kan grupperas oövervakat. Machine Learning Natural Language Processing Unsupervised Clustering Artificial Neural Network Text Categorization Maskininlärning Natural Language Processing Naturlig Språkbehandling Oövervakad Gruppering Artiﬁciella Neurala Nätverk Textkategorisering Computer and Information Sciences Data- och informationsvetenskap
14	Toward a multi-scale understanding of flower development - from auxin networks to dynamic cellular patterns / Vers une compréhension multi-échelle du développement floral : des réseaux auxiniques aux patrons de la dynamique cellulaire Legrand, Jonathan 07 November 2014 (has links) Dans le domaine de la biologie développementale, un des principaux défis est de comprendre comment des tissus multicellulaires, à l'origine indifférenciés, peuvent engendrer des formes aussi complexes que celles d'une fleur. De part son implication dans l'organogenèse florale, l'auxine est une phytohormone majeure. Nous avons donc déterminé son réseau binaire potentiel, puis y avons appliqué des modèles de clustering de graphes s'appuyant sur les profils de connexion présentés par ces 52 facteurs de transcription (FT). Nous avons ainsi pu identifier trois groupes, proches des groupes biologiques putatifs: les facteurs de réponse à l'auxine activateurs (ARF+), répresseurs (ARF-) et les Aux/IAAs. Nous avons détecté l'auto-interaction des ARF+ et des Aux/IAA, ainsi que leur interaction, alors que les ARF- en présentent un nombre restreint. Ainsi, nous proposons un mode de compétition auxine indépendent entre ARF+ et ARF- pour la régulation transcriptionelle. Deuxièmement, nous avons modélisé l'influence des séquences de dimérisation des FT sur la structure de l'interactome en utilisant des modèles de mélange Gaussien pour graphes aléatoires. Les groupes obtenus sont proches des précédents, et les paramètres estimés nous on conduit à conclure que chaque sous-domaine peut jouer un rôle différent en fonction de leur proximité phylogénétique.Enfin, nous sommes passés à l'échelle multi-cellulaire ou, par un graphe spatio-temporel, nous avons modélisé les premiers stades du développement floral d'A. thaliana. Nous avons pu extraire des caractéristiques cellulaires (3D+t) de reconstruction d'imagerie confocale, et avons démontré la possibilité de caractériser l'identité cellulaire en utilisant des méthodes de classification hiérarchique et des arbres de Markov cachés. / A striking aspect of flowering plants is that, although they seem to display a great diversity of size and shape, they are made of the same basics constituents, that is the cells. The major challenge is then to understand how multicellular tissues, originally undifferentiated, can give rise to such complex shapes. We first investigated the uncharacterised signalling network of auxin since it is a major phytohormone involved in flower organogenesis.We started by determining the potential binary network, then applied model-based graph clustering methods relying on connectivity profiles. We demonstrated that it could be summarise in three groups, closely related to putative biological groups. The characterisation of the network function was made using ordinary differential equation modelling, which was later confirmed by experimental observations.In a second time, we modelled the influence of the protein dimerisation sequences on the auxin interactome structure using mixture of linear models for random graphs. This model lead us to conclude that these groups behave differently, depending on their dimerisation sequence similarities, and that each dimerisation domains might play different roles.Finally, we changed scale to represent the observed early stages of A. thaliana flower development as a spatio-temporal property graph. Using recent improvements in imaging techniques, we could extract 3D+t cellular features, and demonstrated the possibility of identifying and characterising cellular identity on this basis. In that respect, hierarchical clustering methods and hidden Markov tree have proven successful in grouping cell depending on their feature similarities. Arabidopsis thaliana Biologie du dévelopement Signalisation auxinique Morphogenèse florale Réseaux de facteurs de transcription Graphes spatio-temporel avec attributs Détection de patrons cellulaire Partitionnement non-supervisé Attributs cellulaires Imagerie sur tissue vivant Arabidopsis thaliana Developmental biology Auxin signalling pathways Floral morphogenesis Transcription factors network Attributed spatio-temporal graphs Cellular patterns detection Unsupervised clustering Cellular features Tissue live-imaging

Page generated in 0.0883 seconds