Shuffled linear regression concerns itself with linear models with an unknown correspondence between the input and the output. This correspondence is usually represented by a permutation matrix II*. The model we are interested in has one more complication which is that the design matrix is itself latent and is observed with noise. This is considered as a type of errors-in-variables (EIV) model. Our interest lies in the recovery of the permutation matrix.
We propose an estimator for II* based on the total least squares (TLS) technique, a common method of estimation used in EIV model. The estimation problem can be viewed as approximating one matrix by another of lower rank and the quantity it seeks to minimize is the sum of the smallest singular values squared.
Due to identifiability issue, we evaluate the proposed estimator by the normalized Procrustes quadratic loss which allows for an orthogonal rotation of the estimated design matrix. Our main result provides an upper bound on this quantity which states that it is required that the signal-to-noise ratio to go to infinity in order for the loss to go to zero.
On the computational front, since the problem of permutation recovery is NP-hard to solve, we propose a simple and efficient algorithm named alternating LAP/TLS algorithm (ALTA) to approximate the estimator, and we use it to empirically examine the main result. The main idea of the algorithm is to alternate between estimating the unknown coefficient matrix using the TLS method and estimating the latent permutation matrix by solving a linear assignment problem (LAP) which runs in polynomial time.
Lastly, we propose a hypothesis testing procedure based on graph matching which we apply in the field of digital humanities, on character social networks constructed from novel series.
Identifer | oai:union.ndltd.org:bu.edu/oai:open.bu.edu:2144/47034 |
Date | 27 September 2023 |
Creators | Wang, Qian |
Contributors | Sussman, Daniel L. |
Source Sets | Boston University |
Language | en_US |
Detected Language | English |
Type | Thesis/Dissertation |
Page generated in 0.0023 seconds