Global ETD Search

111	Nouvelles méthodes pour l’apprentissage non-supervisé en grandes dimensions. / New methods for large-scale unsupervised learning. Tiomoko ali, Hafiz 24 September 2018 (has links) Motivée par les récentes avancées dans l'analyse théorique des performances des algorithmes d'apprentissage automatisé, cette thèse s'intéresse à l'analyse de performances et à l'amélioration de la classification nonsupervisée de données et graphes en grande dimension. Spécifiquement, dans la première grande partie de cette thèse, en s'appuyant sur des outils avancés de la théorie des grandes matrices aléatoires, nous analysons les performances de méthodes spectrales sur des modèles de graphes réalistes et denses ainsi que sur des données en grandes dimensions en étudiant notamment les valeurs propres et vecteurs propres des matrices d'affinités de ces données. De nouvelles méthodes améliorées sont proposées sur la base de cette analyse théorique et démontrent à travers de nombreuses simulations que leurs performances sont meilleures comparées aux méthodes de l'état de l'art. Dans la seconde partie de la thèse, nous proposons un nouvel algorithme pour la détection de communautés hétérogènes entre plusieurs couches d'un graphe à plusieurs types d'interaction. Une approche bayésienne variationnelle est utilisée pour approximer la distribution apostériori des variables latentes du modèle. Toutes les méthodes proposées dans cette thèse sont utilisées sur des bases de données synthétiques et sur des données réelles et présentent de meilleures performances en comparaison aux approches standard de classification dans les contextes susmentionnés. / Spurred by recent advances on the theoretical analysis of the performances of the data-driven machine learning algorithms, this thesis tackles the performance analysis and improvement of high dimensional data and graph clustering. Specifically, in the first bigger part of the thesis, using advanced tools from random matrix theory, the performance analysis of spectral methods on dense realistic graph models and on high dimensional kernel random matrices is performed through the study of the eigenvalues and eigenvectors of the similarity matrices characterizing those data. New improved methods are proposed and are shown to outperform state-of-the-art approaches. In a second part, a new algorithm is proposed for the detection of heterogeneous communities from multi-layer graphs using variational Bayes approaches to approximate the posterior distribution of the sought variables. The proposed methods are successfully applied to synthetic benchmarks as well as real-world datasets and are shown to outperform standard approaches to clustering in those specific contexts. Apprentissage non supervisé Détection de communautés Théorie des matrices aléatoires Inférence bayésienne Unsupervised learning High dimensional data clustering Community detection Random Matrix Theory Bayesian inference
112	Monolith to microservices using deep learning-based community detection / Monolit till mikrotjänster med hjälp av djupinlärningsbaserad klusterdetektion Bothin, Anton January 2023 (has links) The microservice architecture is widely considered to be best practice. Yet, there still exist many companies currently working in monolith systems. This can largely be attributed to the difficult process of updating a systems architecture. The first step in this process is to identify microservices within a monolith. Here, artificial intelligence could be a useful tool for automating the process of microservice identification. The aim of this thesis was to propose a deep learning-based model for the task of microservice identification, and to compare this model to previously proposed approaches. With the goal of helping companies in their endeavour to move towards a microservice-based architecture. In particular, the thesis has evaluated whether the more complex nature of newer deep learning-based techniques can be utilized in order to identify better microservices. The model proposed by this thesis is based on overlapping community detection, where each identified community is considered a microservice candidate. The model was evaluated by looking at cohesion, modularity, and size. Results indicate that the proposed deep learning-based model performs similarly to other state-of-the-art approaches for the task of microservice identification. The results suggest that deep learning indeed helps in finding nontrivial relations within communities, which overall increases the quality of identified microservices, From this it can be concluded that deep learning is a promising technique for the task of microservice identification, and that further research is warranted. / Allmänt anses mikrotjänstarkitekturen vara bästa praxis. Trots det finns det många företag som fortfarande arbetar i monolitiska system. Detta då det finns många svårigheter runt processesen av att byta systemaritekture. Första steget i denna process är att identifiera mikrotjänster inom en monolit. Här kan artificiell intelligens vara ett användbart verktyg för att automatisera processen runt att identifiera mikrotjänster. Denna avhandling syftar till att föreslå en djupinlärningsbaserad modell för att identifiera mikrotjänster och att jämföra denna modell med tidigare föreslagna modeller. Målet är att hjälpa företag att övergå till en mikrotjänstbaserad arkitektur. Avhandlingen kommer att utvärdera nyare djupinlärningsbaserade tekniker för att se ifall deras mer komplexa struktur kan användas för att identifiera bättre mikrotjänster. Modellen som föreslås är baserad på överlappande klusterdetektion, där varje identifierad kluster betraktas som en mikrotjänstkandidat. Modellen utvärderades genom att titta på sammanhållning, modularitet och storlek. Resultaten indikerar att den föreslagna djupinlärningsbaserade modellen identifierar mikrotjänster av liknande kvalitet som andra state-of-the-art-metoder. Resultaten tyder på att djupinlärning bidrar till att hitta icke triviala relationer inom kluster, vilket ökar kvaliteten av de identifierade mikrotjänsterna. På grund av detta dras slutsatsen att djupinlärning är en lovande teknik för identifiering av mikrotjänster och att ytterligare forskning bör utföras. Community detection Deep learning Graph neural network Microservice System architecture Klustringsdetektion Djupinlärning Graf-neuronnät Mikrotjänst Systemarkitektur Computer and Information Sciences Data- och informationsvetenskap
113	Der Einfluss der Länge von Beobachtungszeiträumen auf die Identifizierung von Subgruppen in Online Communities Zeini, Sam, Göhnert, Tilman, Hecking, Tobias, Krempel, Lothar, Hoppe, H. Ulrich January 2013 (has links) Die Verbreitung von Social Media und damit verbunden die entstehenden und wachsenden Communities im Internet führen zu einer Zunahme von auswertbaren, digitalen Spuren, die häufig öffentlich zugänglich sind. Diese lassen sich durch verschiedene analytische Verfahren wie z.B. die Methode der Sozialen Netzwerkanalyse [1] auswerten. Insbesondere Ansätze für „Community Detection“ erfreuen sich besonderer Beliebtheit, wodurch sich unter anderem innovative Untergemeinschaften und Subgruppen beispielsweise in großen „Open Source“-Projekten identifizieren lassen [2]. Im Rahmen dieser Anwendungen ergeben sich neue methodische und grundlegende Fragen, darunter die nach der Rolle der von Zeit in solchen Analysen. Während die Darstellung dynamischer Effekte (z.B. durch Animationen) die Zeit als expliziten Parameter enthält, geht die Wahl der Zeitintervalle für die Aggregation von Daten, aus denen dann Netzwerke gewonnen werden, nur implizit in die Prämissen des Verfahrens ein. Diese Effekte wurden im Gegensatz zur Analyse von Dynamik bisher kaum untersucht. Im Fall der Sozialen Netzwerkanalyse ist die Zielrepräsentation selbst nicht mehr zeitbehaftet sondern sozusagen ein „statischer Schnappschuss“, wodurch etwa zeitabhängige Interaktionsmuster nicht erkannt werden können. (...) info:eu-repo/classification/ddc/330 ddc:330
114	Constructing and representing a knowledge graph(KG) for Positive Energy Districts (PEDs) Davari, Mahtab January 2023 (has links) In recent years, knowledge graphs(KGs) have become essential tools for visualizing concepts and retrieving contextual information. However, constructing KGs for new and specialized domains like Positive Energy Districts (PEDs) presents unique challenges, particularly when dealing with unstructured texts and ambiguous concepts from academic articles. This study focuses on various strategies for constructing and inferring KGs, specifically incorporating entities related to PEDs, such as projects, technologies, organizations, and locations. We utilize visualization techniques and node embedding methods to explore the graph's structure and content and apply filtering techniques and t-SNE plots to extract subgraphs based on specific categories or keywords. One of the key contributions is using the longest path method, which allows us to uncover intricate relationships, interconnectedness between entities, critical paths, and hidden patterns within the graph, providing valuable insights into the most significant connections. Additionally, community detection techniques were employed to identify distinct communities within the graph, providing further understanding of the structural organization and clusters of interconnected nodes with shared themes. The paper also presents a detailed evaluation of a question-answering system based on the KG, where the Universal Sentence Encoder was used to convert text into dense vector representations and calculate cosine similarity to find similar sentences. We assess the system's performance through precision and recall analysis and conduct statistical comparisons of graph embeddings, with Node2Vec outperforming DeepWalk in capturing similarities and connections. For edge prediction, logistic regression, focusing on pairs of neighbours that lack a direct connection, was employed to effectively identify potential connections among nodes within the graph. Additionally, probabilistic edge predictions, threshold analysis, and the significance of individual nodes were discussed. Lastly, the advantages and limitations of using existing KGs(Wikidata and DBpedia) versus constructing new ones specifically for PEDs were investigated. It is evident that further research and data enrichment is necessary to address the scarcity of domain-specific information from existing sources. Knowledge graph Positive Energy Districts (PEDs) longest path Questions and Answers Community Detection Node Embedding t-SNE plots Edge Prediction Computer Sciences Datavetenskap (datalogi)
115	Bullying Detection through Graph Machine Learning : Applying Neo4j’s Unsupervised Graph Learning Techniques to the Friends Dataset Enström, Olof, Eid, Christoffer January 2023 (has links) In recent years, the pervasive issue of bullying, particularly in academic institutions, has witnessed a surge in attention. This report centers around the utilization of the Friends Dataset and Graph Machine Learning to detect possible instances of bullying in an educational setting. The importance of this research lies in the potential it has to enhance early detection and prevention mechanisms, thereby creating safer environments for students. Leveraging graph theory, Neo4j, Graph Data Science Library, and similarity algorithms, among other tools and methods, we devised an approach for processing and analyzing the dataset. Our method involves data preprocessing, application of similarity and community detection algorithms, and result validation with domain experts. The findings of our research indicate that Graph Machine Learning can be effectively utilized to identify potential bullying scenarios, with a particular focus on discerning community structures and their influence on bullying. Our results, albeit preliminary, represent a promising step towards leveraging technology for bullying detection and prevention. Bullying Graph Machine Learning Community Detection Neo4j Data Preprocessing Similarity Algorithms Friends Neo4j Unsupervised Learning Anti-bullying Computer Sciences Datavetenskap (datalogi)
116	Node Centric Community Detection and Evolutional Prediction in Dynamic Networks Oluwafolake A Ayano (13161288) 27 July 2022 (has links) <p> </p> <p>Advances in technology have led to the availability of data from different platforms such as the web and social media platforms. Much of this data can be represented in the form of a network consisting of a set of nodes connected by edges. The nodes represent the items in the networks while the edges represent the interactions between the nodes. Community detection methods have been used extensively in analyzing these networks. However, community detection in evolving networks has been a significant challenge because of the frequent changes to the networks and the need for real-time analysis. Using Static community detection methods for analyzing dynamic networks will not be appropriate because static methods do not retain a network’s history and cannot provide real-time information about the communities in the network.</p> <p>Existing incremental methods treat changes to the network as a sequence of edge additions and/or removals; however, in many real-world networks, changes occur when a node is added with all its edges connecting simultaneously. </p> <p>For efficient processing of such large networks in a timely manner, there is a need for an adaptive analytical method that can process large networks without recomputing the entire network after its evolution and treat all the edges involved with a node equally. </p> <p>We proposed a node-centric community detection method that incrementally updates the community structure in the network using the already known structure of the network to avoid recomputing the entire network from the scratch and consequently achieve a high-quality community structure. The results from our experiments suggest that our approach is efficient for incremental community detection of node-centric evolving networks. </p> Data engineering and data science Data mining and knowledge discovery Graph, social and multimedia data Community Detection Dynamic Networks IP Networks Clustering Big Data Analytics
117	Statistical Analysis of Structured High-dimensional Data Sun, Yizhi 05 October 2018 (has links) High-dimensional data such as multi-modal neuroimaging data and large-scale networks carry excessive amount of information, and can be used to test various scientific hypotheses or discover important patterns in complicated systems. While considerable efforts have been made to analyze high-dimensional data, existing approaches often rely on simple summaries which could miss important information, and many challenges on modeling complex structures in data remain unaddressed. In this proposal, we focus on analyzing structured high-dimensional data, including functional data with important local regions and network data with community structures. The first part of this dissertation concerns the detection of ``important'' regions in functional data. We propose a novel Bayesian approach that enables region selection in the functional data regression framework. The selection of regions is achieved through encouraging sparse estimation of the regression coefficient, where nonzero regions correspond to regions that are selected. To achieve sparse estimation, we adopt compactly supported and potentially over-complete basis to capture local features of the regression coefficient function, and assume a spike-slab prior to the coefficients of the bases functions. To encourage continuous shrinkage of nearby regions, we assume an Ising hyper-prior which takes into account the neighboring structure of the bases functions. This neighboring structure is represented by an undirected graph. We perform posterior sampling through Markov chain Monte Carlo algorithms. The practical performance of the proposed approach is demonstrated through simulations as well as near-infrared and sonar data. The second part of this dissertation focuses on constructing diversified portfolios using stock return data in the Center for Research in Security Prices (CRSP) database maintained by the University of Chicago. Diversification is a risk management strategy that involves mixing a variety of financial assets in a portfolio. This strategy helps reduce the overall risk of the investment and improve performance of the portfolio. To construct portfolios that effectively diversify risks, we first construct a co-movement network using the correlations between stock returns over a training time period. Correlation characterizes the synchrony among stock returns thus helps us understand whether two or multiple stocks have common risk attributes. Based on the co-movement network, we apply multiple network community detection algorithms to detect groups of stocks with common co-movement patterns. Stocks within the same community tend to be highly correlated, while stocks across different communities tend to be less correlated. A portfolio is then constructed by selecting stocks from different communities. The average return of the constructed portfolio over a testing time period is finally compared with the SandP 500 market index. Our constructed portfolios demonstrate outstanding performance during a non-crisis period (2004-2006) and good performance during a financial crisis period (2008-2010). / PHD / High dimensional data, which are composed by data points with a tremendous number of features (a.k.a. attributes, independent variables, explanatory variables), brings challenges to statistical analysis due to their “high-dimensionality” and complicated structure. In this dissertation work, I consider two types of high-dimension data. The first type is functional data in which each observation is a function. The second type is network data whose internal structure can be described as a network. I aim to detect “important” regions in functional data by using a novel statistical model, and I treat stock market data as network data to construct quality portfolios efficiently Bayesian Variable Selection Community Detection Compactly Supported Basis Functional Data Analysis Ising Prior MCMC Network Data Analysis Portfolio Theory Region Selection
118	Probabilistic inference in ecological networks : graph discovery, community detection and modelling dynamic sociality Psorakis, Ioannis January 2013 (has links) This thesis proposes a collection of analytical and computational methods for inferring an underlying social structure of a given population, observed only via timestamped occurrences of its members across a range of locations. It shows that such data streams have a modular and temporally-focused structure, neither fully ordered nor completely random, with individuals appearing in "gathering events". By exploiting such structure, the thesis proposes an appropriate mapping of those spatio-temporal data streams to a social network, based on the co-occurrences of agents across gathering events, while capturing the uncertainty over social ties via the use of probability distributions. Given the extracted graphs mentioned above, an approach is proposed for studying their community organisation. The method considers communities as explanatory variables for the observed interactions, producing overlapping partitions and node membership scores to groups. The aforementioned models are motivated by a large ongoing experiment at Wytham woods, Oxford, where a population of Parus major wild birds is tagged with RFID devices and a grid of feeding locations generates thousands of spatio-temporal records each year. The methods proposed are applied on such data set to demonstrate how they can be used to explore wild bird sociality, reveal its internal organisation across a variety of different scales and provide insights into important biological processes relating to mating pair formation. 519.2
119	Mineração de estruturas musicais e composição automática utilizando redes complexas / Musical structures mining and composition using complex networks Salazar, Andrés Eduardo Coca 26 November 2014 (has links) A teoria das redes complexas tem se tornado cada vez mais em uma poderosa teoria computacional capaz de representar, caracterizar e examinar sistemas com estrutura não trivial, revelando características intrínsecas locais e globais que facilitam a compreensão do comportamento e da dinâmica de tais sistemas. Nesta tese são exploradas as vantagens das redes complexas na resolução de problemas relacionados com tarefas do âmbito musical, especificamente, são estudadas três abordagens: reconhecimento de padrões, mineração e síntese de músicas. A primeira abordagem é desempenhada através do desenvolvimento de um método para a extração do padrão rítmico de uma peça musical de caráter popular. Nesse tipo de peças coexistem diferentes espécies de padrões rítmicos, os quais configuram uma hierarquia que é determinada por aspectos funcionais dentro da base rítmica. Os padrões rítmicos principais são caracterizados por sua maior incidência dentro do discurso musical, propriedade que é refletida na formação de comunidades dentro da rede. Técnicas de detecção de comunidades são aplicadas na extração dos padrões rítmicos, e uma medida para diferenciar os padrões principais dos secundários é proposta. Os resultados mostram que a qualidade da extração é sensível ao algoritmo de detecção, ao modo de representação do ritmo e ao tratamento dado às linhas de percussão na hora de gerar a rede. Uma fase de mineração foi desempenhada usando medidas topológicas sobre a rede obtida após a remoção dos padrões secundários. Técnicas de aprendizado supervisionado e não-supervisionado foram aplicadas para discriminar o gênero musical segundo os atributos calculados na fase de mineração. Os resultados revelam a eficiência da metodologia proposta, a qual foi constatada através de um teste de significância estatística. A última abordagem foi tratada mediante o desenvolvimento de modelos para a composição de melodias através de duas perspectivas, na primeira perspectiva é usada uma caminhada controlada por critérios sobre redes complexas predefinidas e na segunda redes neurais recorrentes e sistemas dinâmicos caóticos. Nesta última perspectiva, o modelo é treinado para compor uma melodia com um valor preestabelecido de alguma característica tonal subjetiva através de uma estratégia de controle proporcional que modifica a complexidade de uma melodia caótica, melodia que atua como entrada de inspiração da rede. / The theory of complex networks has become increasingly a powerful computational tool capable of representing, characterizing and examining systems with non-trivial structure, revealing both local and global intrinsic structures that facilitate the understanding of the behavior and dynamics of such systems. In this thesis, the virtues of complex networks in solving problems related to tasks within the musical scope are explored. Specifically, three approaches are studied: pattern recognition, data mining, and synthesis. The first perspective is addressed by developing a method for extracting the rhythmic pattern of a piece of popular music. In that type of musical pieces, there coexist different types of rhythm patterns which constitute a hierarchy determined by functional aspects within the basic rhythm. The main rhythmic patterns are characterized by a higher incidence within the musical discourse and this factor is reflected in the formation of communities within the network constructed from the music piece. Community detection techniques are applied in the extraction of rhythmic patterns, and a measure to distinguish the main patterns of the secondary is proposed. The results showed that the quality of extraction is sensitive to the detection algorithm, the method of representing rhythm, and treatment of percussion lines when generating the network. Data mining is performed using topological measures over the network obtained after the removal of secondary patterns. Techniques of supervised and unsupervised learning are applied to discriminate the musical genre according to the attributes calculated in the data mining phase. The quantitative results show the efficiency of the proposed methodology, which is confirmed by a test of statistical significance. Regarding the melody generation, an algorithm using a walk controlled by criteria on predefined complex networks has been developed, as well as the development of melody composition models using recurrent neural networks and chaotic dynamical systems. In the last approach, the model is trained to compose a melody with a subjective characteristic melodic value pre-established by a proportional control strategy that acts on the parameters of a chaotic melody as input inspiration. Artificial neural networks Chaotic dynamical systems Community detection Detecção de comunidades Identificação de gêneros musicais Music genre recognition Reconhecimento de padrões rítmicos Redes complexas Redes neurais artificiais Rhythmic pattern recognition Sistemas dinâmicos caóticos
120	由職官年表中利用循序共現樣式探勘人脈網絡 / Social network analysis from official chronology using sequential co-occurrence pattern mining 宋邡熏, Song, Fang Shiun Unknown Date (has links) 在政治權力結構中，權臣與派系在其政治人物的社會網絡中扮演重要的角色。本論文研究由職官年表中探勘權臣與派系。我們提出資料探勘演算法由職官年表中探勘循序共現樣式，以探勘出政府官員官職陞貶的共現關係。接著根據所探勘出的循序共現樣式，建立官員之間的社會網絡。透過社會網絡分析中的網絡中心性與社群偵測分別探勘出權臣與派系。本論文以清康熙時期的職官年表實驗驗證。透過視覺化分析顯示本論文所提出的方法有助於歷史學者的研究。 / In a power structure, chief officials and cliques play important roles in the social network and have high influence on politics. This thesis proposes an approach of social network mining from official chronologies to discover the chief officials and the cliques. We propose and develop the algorithm to discover the sequential co-occurrence patterns from official chronologies. Then the social network is constructed based on the discovered sequential co-occurrence patterns. Chief officials are discovered by network centrality analysis while cliques are discovered by community analysis of the constructed social network. The official chronology of Kangxi Emperor is taken as an example for experiments and the visualization analysis demonstrates that the proposed methods are helpful to assist historian for historical research. 社群網絡探勘網路中心性社群偵測史料探勘職官年表 Social Network Mining Network Centrality Community Detection Historical Document Mining Official Chronology

Search results