Return to search

Statistical solutions for multiple networks

Networks are quickly becoming one of the most common data types across diverse disciplines from the biological to the social sciences. Consequently, the study of networks as data objects is fundamental to developing statistical methodology for answering complex scientific questions. In this dissertation, we provide statistical solutions to three tasks related to multiple networks.

We first consider the task of prediction given a collection of observed networks. In particular, we provide a Bayesian approach to performing classification, anomaly detection, and survival analysis with network inputs. Our methodology is based on encoding networks as pairwise differences in the kernel of a Gaussian process prior and we are motivated by the goal of predicting preterm delivery using individual microbiome networks.

We next consider the task of exploring reaction space in high-throughput chemistry, where the inputs to a reaction are two or more molecules. Our goal is to create a workflow that facilitates quick, low-cost, and effective analysis of reactions. In order to operationalize this goal, we develop a statistical approach that breaks the analysis into several steps based on four unique challenges that we identify. Each of these challenges requires careful consideration in creating our analysis plan. For instance, to address the fact that reactions are run on multiwell plates, we formulate our proposal as a constrained optimization problem; then, we leverage the underlying structure by realizing a plate as a bipartite graph, which allows us to reformulate the problem as a maximal edge biclique problem. These solutions are necessary to optimally navigate a large reaction space given limited resources, which is critical in the application of reaction chemistry, for example, to drug discovery.

The final task we consider is the recovery of a network given a sample of noisy unlabeled copies of the network. Toward this end, we make a connection between the noisy network literature and the correlated Erdős–Rényi graph model, which allows us to employ results from graph matching. Research on multiple unlabeled networks has otherwise been underdeveloped but is emerging in areas such as differential privacy and anonymized networks, as well as measurement error in network construction. / 2022-10-25T00:00:00Z

Identiferoai:union.ndltd.org:bu.edu/oai:open.bu.edu:2144/43216
Date26 October 2021
CreatorsJosephs, Nathaniel
ContributorsKolaczyk, Eric D.
Source SetsBoston University
Languageen_US
Detected LanguageEnglish
TypeThesis/Dissertation
RightsAttribution 4.0 International, http://creativecommons.org/licenses/by/4.0/

Page generated in 0.0023 seconds