Global ETD Search

1	Modeling and projection of respondent driven network samples Zhuang, Zhihe January 1900 (has links) Master of Science / Department of Statistics / Perla E. Reyes Cuellar / The term network has become part of our everyday vocabulary. The more popular are perhaps the social ones, but the concept also includes business partnerships, literature citations, biological networks, among others. Formally, networks are defined as sets of items and their connections. Often modeled as the mathematic object known as a graph, networks have been studied extensively for several years, and research is widely available. In statistics, a variety of modeling techniques and statistical terms have been developed to analyze them and predict individual behaviors. Specifically, certain statistics like degree distribution, clustering coefficient, and so on are considered important indicators in traditional social network studies. However, while conventional network models assume that the whole network population is known, complete information is not always available. Thus, different sampling methods are often required when the population data is inaccessible. Less time has been dedicated to studying the accuracy of these sampling methods to produce a representative sample. As such, the aim of this report is to identify the capacity of sampling techniques to reflect the features of the original network. In particular, we study Anti-cluster Respondent Driven Sampling (AC-RDS). We also explore whether standard modeling techniques paired with sample data could estimate statistics often used in the study of social networks. Respondent Driven Sampling (RDS) is a chain referral approach to study rare and/or hidden populations. Originating from the link-tracing design, RDS has been further developed into a series of methods utilized in social network studies, such as locating target populations or estimating the number and proportion of needle-sharing among drug addicts. However, RDS does not always perform as well as expected. When the social network contains tight communities (or clusters) with few connections between them, traditional RDS tends to oversample one community, introducing bias. AC-RDS is a special Markov chain process that collects samples across communities, capturing the whole network. With special referral requests, the initial seeds are more likely to refer to the individuals that are outside their communities. In this report, we fitted the Exponential Random Graph Model (ERGM) and a Stochastic Block Model (SBM) to an empirical study of the Facebook friendship network of 1034 participants. Then, given our goal of identifying techniques that will produce a representative sample, we decided to compare two version of AC-RDSs, in addition to traditional RDS, with Simple Random Sampling (SRS). We compared the methods by drawing 100 network samples using each sampling technique, then fitting an SBM to each sample network we used the results to project the network into one of population size. We calculated essential network statistics, such as degree distribution, of each sampling method and then compared the result to the original network observed statistics. Networks Respondent Driven Sampling Nonparametric Bayesian Sampling Methods Stochastic Blockmodel
2	Bayesian stochastic blockmodels for community detection in networks and community-structured covariance selection Peng, Lijun 08 April 2016 (has links) Networks have been widely used to describe interactions among objects in diverse fields. Given the interest in explaining a network by its structure, much attention has been drawn to finding clusters of nodes with dense connections within clusters but sparse connections between clusters. Such clusters are called communities, and identifying such clusters is known as community detection. Here, to perform community detection, I focus on stochastic blockmodels (SBM), a class of statistically-based generative models. I present a flexible SBM that represents different types of data as well as node attributes under a Bayesian framework. The proposed models explicitly capture community behavior by guaranteeing that connections are denser within communities than between communities. First, I present a degree-corrected SBM based on a logistic regression formulation to model binary networks. To fit the model, I obtain posterior samples via Gibbs sampling based on Polya-Gamma latent variables. I conduct inference based on a novel, canonically mapped centroid estimator that formally addresses label non-identifiability and captures representative community assignments. Next, to accommodate large-scale datasets, I further extend the degree-corrected SBM to a broader family of generalized linear models with group correction terms. To conduct exact inference efficiently, I develop an iteratively-reweighted least squares procedure that implicitly updates sufficient statistics on the network to obtain maximum a posteriori (MAP) estimators. I demonstrate the proposed model and estimation on simulated benchmark networks and various real-world datasets. Finally, I develop a Bayesian SBM for community-structured covariance selection. Here, I assume that the data at each node are Gaussian and a latent network where two nodes are not connected if their observations are conditionally independent given observations of other nodes. Under the context of biological and social applications, I expect that this latent network shows a block dependency structure that represents community behavior. Thus, to identify the latent network and detect communities, I propose a hierarchical prior in two levels: a spike-and-slab prior on off-diagonal entries of the concentration matrix for variable selection and a degree-corrected SBM to capture community behavior. I develop an efficient routine based on ridge regularization and MAP estimation to conduct inference. Statistics Centroid estimation Community detection Covariance selection Degree correction Stochastic blockmodel
3	A Model for Seasonal Dynamic Networks Robinson, Jace D. 16 May 2018 (has links) No description available. Computer Science Artificial Intelligence Information Science Stochastic Blockmodel Dynamic Networks Seasonal Time Series Kalman Filter
4	Hypothesis testing and community detection on networks with missingness and block structure Guilherme Maia Rodrigues Gomes (8086652) 06 December 2019 (has links) Statistical analysis of networks has grown rapidly over the last few years with increasing number of applications. Graph-valued data carries additional information of dependencies which opens the possibility of modeling highly complex objects in vast number of fields such as biology (e.g. brain networks , fungi networks, genes co-expression), chemistry (e.g. molecules fingerprints), psychology (e.g. social networks) and many others (e.g. citation networks, word co-occurrences, financial systems, anomaly detection). While the inclusion of graph structure in the analysis can further help inference, simple statistical tasks in a network is very complex. For instance, the assumption of exchangeability of the nodes or the edges is quite strong, and it brings issues such as sparsity, size bias and poor characterization of the generative process of the data. Solutions to these issues include adding specific constraints and assumptions on the data generation process. In this work, we approach this problem by assuming graphs are globally sparse but locally dense, which allows exchangeability assumption to hold in local regions of the graph. We consider problems with two types of locality structure: block structure (also framed as multiple graphs or population of networks) and unstructured sparsity which can be seen as missing data. For the former, we developed a hypothesis testing framework for weighted aligned graphs; and a spectral clustering method for community detection on population of non-aligned networks. For the latter, we derive an efficient spectral clustering approach to learn the parameters of the zero inflated stochastic blockmodel. Overall, we found that incorporating multiple local dense structures leads to a more precise and powerful local and global inference. This result indicates that this general modeling scheme allows for exchangeability assumption on the edges to hold while generating more realistic graphs. We give theoretical conditions for our proposed algorithms, and we evaluate them on synthetic and real-world datasets, we show our models are able to outperform the baselines on a number of settings. <br> Statistics Statistical Network Analysis hypothesis testing community detection bayesian hypothesis testing multiple graphs population of networks stochastic blockmodel
5	Recherche de structure dans un graphe aléatoire : modèles à espace latent / Clustering in a random graph : models with latent space Channarond, Antoine 10 December 2013 (has links) Cette thèse aborde le problème de la recherche d'une structure (ou clustering) dans lesnoeuds d'un graphe. Dans le cadre des modèles aléatoires à variables latentes, on attribue à chaque noeud i une variable aléatoire non observée (latente) Zi, et la probabilité de connexion des noeuds i et j dépend conditionnellement de Zi et Zj . Contrairement au modèle d'Erdos-Rényi, les connexions ne sont pas indépendantes identiquement distribuées; les variables latentes régissent la loi des connexions des noeuds. Ces modèles sont donc hétérogènes, et leur structure est décrite par les variables latentes et leur loi; ce pourquoi on s'attache à en faire l'inférence à partir du graphe, seule variable observée.La volonté commune des deux travaux originaux de cette thèse est de proposer des méthodes d'inférence de ces modèles, consistentes et de complexité algorithmique au plus linéaire en le nombre de noeuds ou d'arêtes, de sorte à pouvoir traiter de grands graphes en temps raisonnable. Ils sont aussi tous deux fondés sur une étude fine de la distribution des degrés, normalisés de façon convenable selon le modèle.Le premier travail concerne le Stochastic Blockmodel. Nous y montrons la consistence d'un algorithme de classiffcation non supervisée à l'aide d'inégalités de concentration. Nous en déduisons une méthode d'estimation des paramètres, de sélection de modèles pour le nombre de classes latentes, et un test de la présence d'une ou plusieurs classes latentes (absence ou présence de clustering), et nous montrons leur consistence.Dans le deuxième travail, les variables latentes sont des positions dans l'espace ℝd, admettant une densité f, et la probabilité de connexion dépend de la distance entre les positions des noeuds. Les clusters sont définis comme les composantes connexes de l'ensemble de niveau t > 0 fixé de f, et l'objectif est d'en estimer le nombre à partir du graphe. Nous estimons la densité en les positions latentes des noeuds grâce à leur degré, ce qui permet d'établir une correspondance entre les clusters et les composantes connexes de certains sous-graphes du graphe observé, obtenus en retirant les nœuds de faible degré. En particulier, nous en déduisons un estimateur du nombre de clusters et montrons saconsistence en un certain sens / .This thesis addresses the clustering of the nodes of a graph, in the framework of randommodels with latent variables. To each node i is allocated an unobserved (latent) variable Zi and the probability of nodes i and j being connected depends conditionally on Zi and Zj . Unlike Erdos-Renyi's model, connections are not independent identically distributed; the latent variables rule the connection distribution of the nodes. These models are thus heterogeneous and their structure is fully described by the latent variables and their distribution. Hence we aim at infering them from the graph, which the only observed data.In both original works of this thesis, we propose consistent inference methods with a computational cost no more than linear with respect to the number of nodes or edges, so that large graphs can be processed in a reasonable time. They both are based on a study of the distribution of the degrees, which are normalized in a convenient way for the model.The first work deals with the Stochastic Blockmodel. We show the consistency of an unsupervised classiffcation algorithm using concentration inequalities. We deduce from it a parametric estimation method, a model selection method for the number of latent classes, and a clustering test (testing whether there is one cluster or more), which are all proved to be consistent. In the second work, the latent variables are positions in the ℝd space, having a density f. The connection probability depends on the distance between the node positions. The clusters are defined as connected components of some level set of f. The goal is to estimate the number of such clusters from the observed graph only. We estimate the density at the latent positions of the nodes with their degree, which allows to establish a link between clusters and connected components of some subgraphs of the observed graph, obtained by removing low degree nodes. In particular, we thus derive an estimator of the cluster number and we also show the consistency in some sense. Statistiques Graphes aléatoires Stochastic Blockmodel Clustering Classification non supervisée Estimation non-paramétrique Sélection de modèles Linkage Estimation non-paramétrique Ensembles de niveau Statistics Random graphs Stochastic Blockmodel Hidden or latent variables models Clustering Unsupervised classification Parametric estimation Model selection Linkage Non-parametric estimation Level sets

1

Page generated in 0.1457 seconds