Collecting a sufficient amount of data and centralizing them are both costly and privacy-concerning operations. These practical concerns arise due to the communication costs between data collecting devices and data being personal such as text messages of an end user. The goal is to train generalizable machine learning models with constraints on data without sharing or transferring the data.
In this thesis, we will present solutions to several aspects of learning with data constraints, such as processing and supervision. We focus on federated learning, online learning, and learning generalizable representations and provide setting-specific training recipes.
In the first scenario, we tackle a federated learning problem where data is decentralized through different users and should not be centralized. Traditional approaches either ignore the heterogeneity problem or increase communication costs to handle it. Our solution carefully addresses the heterogeneity issue of user data by imposing a dynamic regularizer that adapts to the heterogeneity of each user without extra transmission costs. Theoretically, we establish convergence guarantees. We extend our ideas to personalized federated learning, where the model is customized to each end user, and heterogeneous federated learning, where users support different model architectures.
As a next scenario, we consider online meta-learning, where there is only one user, and the data distribution of the user changes over time. The goal is to adapt new data distributions with very few labeled data from each distribution. A naive way is to store data from different distributions to train a model from scratch with sufficient data. Our solution efficiently summarizes the information from each task data so that the memory footprint does not scale with the number of tasks.
Lastly, we aim to train generalizable representations given a dataset. We consider a setting where we have access to a powerful teacher (more complex) model. Traditional methods do not distinguish points and force the model to learn all the information from the powerful model. Our proposed method focuses on the learnable input space and carefully distills attainable information from the teacher model by discarding the over-capacity information.
We compare our methods with state-of-the-art methods in each setup and show significant performance improvements. Finally, we discuss potential directions for future work.
Identifer | oai:union.ndltd.org:bu.edu/oai:open.bu.edu:2144/46632 |
Date | 30 August 2023 |
Creators | Acar, Durmuş Alp Emre |
Contributors | Saligrama, Venkatesh |
Source Sets | Boston University |
Language | en_US |
Detected Language | English |
Type | Thesis/Dissertation |
Rights | Attribution-NonCommercial-ShareAlike 4.0 International, http://creativecommons.org/licenses/by-nc-sa/4.0/ |
Page generated in 0.002 seconds