Global ETD Search

1	Stochastic Block Model Dynamics Nithish Kumar Kumar (10725294) 29 April 2021 (has links) <div>The past few years have seen an increasing focus on fairness and the long-term impact of algorithmic decision making in the context of Machine learning, Artificial Intelligence and other disciplines. In this thesis, we model hiring processes in enterprises and organizations using dynamic mechanism design. Using a stochastic block model to simulate the workings of a hiring process, we study fairness and long-term evolution in the system. </div><div> </div><div> We first present multiple results on a deterministic variant of our model including convergence and an accurate approximate solution describing the state of the deterministic variant after any time period has elapsed. Using the differential equation method, it can be shown that this deterministic variant is in turn an accurate approximation of the evolution of our stochastic block model with high probability.</div><div> </div><div> Finally, we derive upper and lower bounds on the expected state at each time step, and further show that in the limiting case of the long-term, these upper and lower bounds themselves converge to the state evolution of the deterministic system. These results offer conclusions on the long-term behavior of our model, thereby allowing reasoning on how fairness in organizations could be achieved. We conclude that without sufficient, systematic incentives, under-represented groups will wane out from organizations over time.</div> Theoretical Computer Science fairness dynamic systems theory Stochastic block model
2	Unsupervised random walk node embeddings for network block structure representation Lin, Christy 25 September 2021 (has links) There has been an explosion of network data in the physical, chemical, biological, computational, and social sciences in the last few decades. Node embeddings, i.e., Euclidean-space representations of nodes in a network, make it possible to apply to network data, tools and algorithms from multivariate statistics and machine learning that were developed for Euclidean-space data. Random walk node embeddings are a class of recently developed node embedding techniques where the vector representations are learned by optimizing objective functions involving skip-bigram statistics computed from random walks on the network. They have been applied to many supervised learning problems such as link prediction and node classification and have demonstrated state-of-the-art performance. Yet, their properties remain poorly understood. This dissertation studies random walk based node embeddings in an unsupervised setting within the context of capturing hidden block structure in the network, i.e., learning node representations that reflect their patterns of adjacencies to other nodes. This doctoral research (i) Develops VEC, a random walk based unsupervised node embedding algorithm, and a series of relaxations, and experimentally validates their performance for the community detection problem under the Stochastic Block Model (SBM). (ii) Characterizes the ergodic limits of the embedding objectives to create non-randomized versions. (iii) Analyzes the embeddings for expected SBM networks and establishes certain concentration properties of the limiting ergodic objective in the large network asymptotic regime. Comprehensive experimental results on real world and SBM random networks are presented to illustrate and compare the distributional and block-structure properties of node embeddings generated by VEC and related algorithms. As a step towards theoretical understanding, it is proved that for the variants of VEC with ergodic limits and convex relaxations, the embedding Grammian of the expected network of a two-community SBM has rank at most 2. Further experiments reveal that these extensions yield embeddings whose distribution is Gaussian-like, centered at the node embeddings of the expected network within each community, and concentrate in the linear degree-scaling regime as the number of nodes increases. / 2023-09-24T00:00:00Z Computer science Community detection Node embeddings Random walk Stochastic block model
3	Impact de l’échantillonnage sur l’inférence de structures dans les réseaux : application aux réseaux d’échanges de graines et à l’écologie / Impact of sampling on structure inference in networks : application to seed exchange networks and to ecology Tabouy, Timothée 30 September 2019 (has links) Dans cette thèse nous nous intéressons à l’étude du modèle à bloc stochastique (SBM) en présence de données manquantes. Nous proposons une classification des données manquantes en deux catégories Missing At Random et Not Missing At Random pour les modèles à variables latentes suivant le modèle décrit par D. Rubin. De plus, nous nous sommes attachés à décrire plusieurs stratégies d’échantillonnages de réseau et leurs lois. L’inférence des modèles de SBM avec données manquantes est faite par l’intermédiaire d’une adaptation de l’algorithme EM : l’EM avec approximation variationnelle. L’identifiabilité de plusieurs des SBM avec données manquantes a pu être démontrée ainsi que la consistance et la normalité asymptotique des estimateurs du maximum de vraisemblance et des estimateurs avec approximation variationnelle dans le cas où chaque dyade (paire de nœuds) est échantillonnée indépendamment et avec même probabilité. Nous nous sommes aussi intéressés aux modèles de SBM avec covariables, à leurs inférence en présence de données manquantes et comment procéder quand les covariables ne sont pas disponibles pour conduire l’inférence. Finalement, toutes nos méthodes ont été implémenté dans un package R disponible sur le CRAN. Une documentation complète sur l’utilisation de ce package a été écrite en complément. / In this thesis we are interested in studying the stochastic block model (SBM) in the presence of missing data. We propose a classification of missing data into two categories Missing At Random and Not Missing At Random for latent variable models according to the model described by D. Rubin. In addition, we have focused on describing several network sampling strategies and their distributions. The inference of SBMs with missing data is made through an adaptation of the EM algorithm : the EM with variational approximation. The identifiability of several of the SBM models with missing data has been demonstrated as well as the consistency and asymptotic normality of the maximum likelihood estimators and variational approximation estimators in the case where each dyad (pair of nodes) is sampled independently and with equal probability. We also looked at SBMs with covariates, their inference in the presence of missing data and how to proceed when covariates are not available to conduct the inference. Finally, all our methods were implemented in an R package available on the CRAN. A complete documentation on the use of this package has been written in addition. Modèle à blocs stochastiques Réseaux Données manquantes Networks Missing data Stochastic Block Model
4	Low-rank Matrix Estimation Fan, Xing 01 January 2024 (has links) (PDF) The first part of this dissertation focuses on matrix-covariate regression models. While they have been studied in many existing works, classical statistical and computational methods for the analysis of the regression coefficient estimation are highly affected by high dimensional matrix-valued covariates. To address these issues, we proposes a framework of matrix-covariate regression models based on a low-rank constraint and an additional regularization for structured signals, with considerations of models of both continuous and binary responses. In the second part, we examine a Mixture Multilayer Stochastic Block Model (MMLSBM), where layers can be grouped into sets of similar networks. Each group of networks is endowed with a unique Stochastic Block Model. The objective is to partition the multilayer network into clusters of similar layers and identify communities within those layers. We present an alternative approach called the Alternating Minimization Algorithm (ALMA), which aims to simultaneously recover the layer partition and estimate the matrices of connection probabilities for the distinct layers. In the last part, we demonstrates the effectiveness of the projected gradient descent algorithm. Firstly, its local convergence rate is independent of the condition number. Secondly, under conditions where the objective function is rank-2r restricted L-smooth and μ-strongly convex, with L/μ < 3, projected gradient descent with appropriate step size converges linearly to the solution. Moreover, a perturbed version of this algorithm effectively navigates away from saddle points, converging to an approximate solution or a second-order local minimizer across a wide range of step sizes. Furthermore, we establish that there are no spurious local minimizes in estimating asymmetric low-rank matrices when the objective function satisfies L/μ < 3. Low-rank matrix stochastic block model clustering generalized linear mode ill-conditioned matrix recovery Mathematics
5	[en] A MIP APPROACH FOR COMMUNITY DETECTION IN THE STOCHASTIC BLOCK MODEL / [pt] UMA ABORDAGEM DE PROGRAMAÇÃO INTEIRA MISTA PARA DETECÇÃO DE COMUNIDADES NO STOCHASTIC BLOCK MODEL BRENO SERRANO DE ARAUJO 04 November 2020 (has links) [pt] O Degree-Corrected Stochastic Block Model (DCSBM) é um modelo popular para geração de grafos aleatórios com estrutura de comunidade, dada uma sequência de graus esperados. O princípio básico de algoritmos que utilizam o DCSBM para detecção de comunidades é ajustar os parâmetros do modelo a dados observados, de forma a encontrar a estimativa de máxima verossimilhança, ou maximum likelihood estimate (MLE), dos parâmetros do modelo. O problema de otimização para o MLE é comumente resolvido por meio de heurísticas. Neste trabalho, propomos métodos de programação matemática, para resolver de forma exata o problema de otimização descrito, e comparamos os métodos propostos com heurísticas baseadas no algoritmo de expectation-maximization (EM). Métodos exatos são uma ferramenta fundamental para a avaliação de heurísticas, já que nos permitem identificar se uma solução heurística é sub-ótima e medir seu gap de otimalidade. / [en] The Degree-Corrected Stochastic Block Model (DCSBM) is a popular model to generate random graphs with community structure given an expected degree sequence. The standard approach of community detection algorithms based on the DCSBM is to search for the model parameters which are the most likely to have produced the observed network data, via maximum likelihood estimation (MLE). Current techniques for the MLE problem are heuristics and therefore do not guarantee convergence to the optimum. We present mathematical programming formulations and exact solution methods that can provably find the model parameters and community assignments of maximum likelihood given an observed graph. We compare the proposed exact methods with classical heuristic algorithms based on expectation-maximization (EM). The solutions given by exact methods give us a principled way of recognizing when heuristic solutions are sub-optimal and measuring how far they are from optimality. [pt] PROGRAMACAO INTEIRA MISTA [pt] APRENDIZADO NAO SUPERVISIONADO [pt] STOCHASTIC BLOCK MODEL [pt] DETECCAO DE COMUNIDADES [pt] BUSCA LOCAL [pt] MACHINE LEARNING [en] MIXED INTEGER PROGRAMMING [en] UNSUPERVISED LEARNING [en] STOCHASTIC BLOCK MODEL [en] COMMUNITY DETECTION [en] LOCAL SEARCH [en] MACHINE LEARNING
6	Analyse statistique des réseaux et applications aux sciences humaines / Statistical analysis of networks and applications in human sciences Zreik, Rawya 30 November 2016 (has links) Depuis les travaux précurseurs de Moreno (1934), l’analyse des réseaux est devenue une discipline forte, qui ne se limite plus à la sociologie et qui est à présent appliquée à des domaines très variés tels que la biologie, la géographie ou l’histoire. L’intérêt croissant pour l’analyse des réseaux s’explique d’une part par la forte présence de ce type de données dans le monde numérique d’aujourd’hui et, d’autre part, par les progrès récents dans la modélisation et le traitement de ces données. En effet, informaticiens et statisticiens ont porté leurs efforts depuis plus d’une dizaine d’années sur ces données de type réseau en proposant des nombreuses techniques permettant leur analyse. Parmi ces techniques on note les méthodes de clustering qui permettent en particulier de découvrir une structure en groupes cachés dans le réseau. De nombreux facteurs peuvent exercer une influence sur la structure d’un réseau ou rendre les analyses plus faciles à comprendre. Parmi ceux-ci, on trouve deux facteurs importants: le facteur du temps, et le contexte du réseau. Le premier implique l’évolution des connexions entre les nœuds au cours du temps. Le contexte du réseau peut alors être caractérisé par différents types d’informations, par exemple des messages texte (courrier électronique, tweets, Facebook, messages, etc.) échangés entre des nœuds, des informations catégoriques sur les nœuds (âge, sexe, passe-temps, Les fréquences d’interaction (par exemple, le nombre de courriels envoyés ou les commentaires affichés), et ainsi de suite. La prise en considération de ces facteurs nous permet de capturer de plus en plus d’informations complexes et cachées à partir des données. L’objectif de ma thèse été de définir des nouveaux modèles de graphes aléatoires qui prennent en compte les deux facteurs mentionnés ci-dessus, afin de développer l’analyse de la structure du réseau et permettre l’extraction de l’information cachée à partir des données. Ces modèles visent à regrouper les sommets d’un réseau en fonction de leurs profils de connexion et structures de réseau, qui sont statiques ou évoluant dynamiquement au cours du temps. Le point de départ de ces travaux est le modèle de bloc stochastique (SBM). Il s’agit d’un modèle de mélange pour les graphiques qui ont été initialement développés en sciences sociales. Il suppose que les sommets d’un réseau sont répartis sur différentes classes, de sorte que la probabilité d’une arête entre deux sommets ne dépend que des classes auxquelles ils appartiennent. / Over the last two decades, network structure analysis has experienced rapid growth with its construction and its intervention in many fields, such as: communication networks, financial transaction networks, gene regulatory networks, disease transmission networks, mobile telephone networks. Social networks are now commonly used to represent the interactions between groups of people; for instance, ourselves, our professional colleagues, our friends and family, are often part of online networks, such as Facebook, Twitter, email. In a network, many factors can exert influence or make analyses easier to understand. Among these, we find two important ones: the time factor, and the network context. The former involves the evolution of connections between nodes over time. The network context can then be characterized by different types of information such as text messages (email, tweets, Facebook, posts, etc.) exchanged between nodes, categorical information on the nodes (age, gender, hobbies, status, etc.), interaction frequencies (e.g., number of emails sent or comments posted), and so on. Taking into consideration these factors can lead to the capture of increasingly complex and hidden information from the data. The aim of this thesis is to define new models for graphs which take into consideration the two factors mentioned above, in order to develop the analysis of network structure and allow extraction of the hidden information from the data. These models aim at clustering the vertices of a network depending on their connection profiles and network structures, which are either static or dynamically evolving. The starting point of this work is the stochastic block model, or SBM. This is a mixture model for graphs which was originally developed in social sciences. It assumes that the vertices of a network are spread over different classes, so that the probability of an edge between two vertices only depends on the classes they belong to. Analyse des réseaux Méthodes de clustering Informations complexes Données Modèle de bloc stochastique Network structure analysi Communication networks Financial transaction networks Stochastic block model 519
7	Continuous Time Models for Epidemic Processes and Contact Networks Ahmad, Rehan January 2021 (has links) No description available. Computer Science Epidemiology Mathematics Sociology Statistics Stochastic Block Model Epidemic Modeling Epidemiology Continuous-Time Network Model Continuous-Time Epidemic Model Dynamic Networks Point Process Model
8	Modeling, Evaluation and Analysis of Dynamic Networks for Social Network Analysis Junuthula, Ruthwik Reddy January 2018 (has links) No description available. Computer Science Mathematics Sociology Statistics Variational Inference Social Network Analysis Continuous Time Network Models Machine Learning Stochastic Block Model Local Search Interaction Networks Link Prediction
9	Statistical inference on random graphs and networks / Inferência estatística para grafos aleatórios e redes Cerqueira, Andressa 28 February 2018 (has links) In this thesis we study two probabilistic models defined on graphs: the Stochastic Block model and the Exponential Random Graph. Therefore, this thesis is divided in two parts. In the first part, we introduce the Krichevsky-Trofimov estimator for the number of communities in the Stochastic Block Model and prove its eventual almost sure convergence to the underlying number of communities, without assuming a known upper bound on that quantity. In the second part of this thesis we address the perfect simulation problem for the Exponential random graph model. We propose an algorithm based on the Coupling From The Past algorithm using a Glauber dynamics. This algorithm is efficient in the case of monotone models. We prove that this is the case for a subset of the parametric space. We also propose an algorithm based on the Backward and Forward algorithm that can be applied for monotone and non monotone models. We prove the existence of an upper bound for the expected running time of both algorithms. / Nessa tese estudamos dois modelos probabilísticos definidos em grafos: o modelo estocástico por blocos e o modelo de grafos exponenciais. Dessa forma, essa tese está dividida em duas partes. Na primeira parte nós propomos um estimador penalizado baseado na mistura de Krichevsky-Trofimov para o número de comunidades do modelo estocástico por blocos e provamos sua convergência quase certa sem considerar um limitante conhecido para o número de comunidades. Na segunda parte dessa tese nós abordamos o problema de simulação perfeita para o modelo de grafos aleatórios Exponenciais. Nós propomos um algoritmo de simulação perfeita baseado no algoritmo Coupling From the Past usando a dinâmica de Glauber. Esse algoritmo é eficiente apenas no caso em que o modelo é monotóno e nós provamos que esse é o caso para um subconjunto do espaço paramétrico. Nós também propomos um algoritmo de simulação perfeita baseado no algoritmo Backward and Forward que pode ser aplicado à modelos monótonos e não monótonos. Nós provamos a existência de um limitante superior para o número esperado de passos de ambos os algoritmos. Algoritmo Backward and Forward Algoritmo Coupling From the Past Backward and Forward algorithm Couping From the Past algorithm Estimação Estimation Exponential random graph Grafos aleatórios exponenciais Krichevisky-Trofimov Krichevsky-Trofimov Modelo estocástico por blocos Perfect simulation Simulação perfeita Stochastic block model
10	Statistical inference on random graphs and networks / Inferência estatística para grafos aleatórios e redes Andressa Cerqueira 28 February 2018 (has links) In this thesis we study two probabilistic models defined on graphs: the Stochastic Block model and the Exponential Random Graph. Therefore, this thesis is divided in two parts. In the first part, we introduce the Krichevsky-Trofimov estimator for the number of communities in the Stochastic Block Model and prove its eventual almost sure convergence to the underlying number of communities, without assuming a known upper bound on that quantity. In the second part of this thesis we address the perfect simulation problem for the Exponential random graph model. We propose an algorithm based on the Coupling From The Past algorithm using a Glauber dynamics. This algorithm is efficient in the case of monotone models. We prove that this is the case for a subset of the parametric space. We also propose an algorithm based on the Backward and Forward algorithm that can be applied for monotone and non monotone models. We prove the existence of an upper bound for the expected running time of both algorithms. / Nessa tese estudamos dois modelos probabilísticos definidos em grafos: o modelo estocástico por blocos e o modelo de grafos exponenciais. Dessa forma, essa tese está dividida em duas partes. Na primeira parte nós propomos um estimador penalizado baseado na mistura de Krichevsky-Trofimov para o número de comunidades do modelo estocástico por blocos e provamos sua convergência quase certa sem considerar um limitante conhecido para o número de comunidades. Na segunda parte dessa tese nós abordamos o problema de simulação perfeita para o modelo de grafos aleatórios Exponenciais. Nós propomos um algoritmo de simulação perfeita baseado no algoritmo Coupling From the Past usando a dinâmica de Glauber. Esse algoritmo é eficiente apenas no caso em que o modelo é monotóno e nós provamos que esse é o caso para um subconjunto do espaço paramétrico. Nós também propomos um algoritmo de simulação perfeita baseado no algoritmo Backward and Forward que pode ser aplicado à modelos monótonos e não monótonos. Nós provamos a existência de um limitante superior para o número esperado de passos de ambos os algoritmos. Algoritmo Backward and Forward Algoritmo Coupling From the Past Estimação Grafos aleatórios exponenciais Krichevsky-Trofimov Modelo estocástico por blocos Simulação perfeita Backward and Forward algorithm Couping From the Past algorithm Estimation Exponential random graph Krichevisky-Trofimov Perfect simulation Stochastic block model

Search results