Return to search

On Mixup Training of Neural Networks

Deep neural networks are powerful tools of machine learning. Despite their capabilities of fitting the training data, they tend to perform undesirably on the unseen data. To improve the generalization of the deep neural networks, a variety of regularization techniques have been proposed. This thesis studies a simple yet effective regularization scheme, Mixup, which has been proposed recently. Briefly speaking, Mixup creates synthetic examples by linearly interpolating random pairs of the real examples and uses the synthetic examples for training. Although Mixup has been empirically shown to be effective on various classification tasks for neural network models, its working mechanism and possible limitations have not been well understood.

One potential problem of Mixup is known as manifold intrusion, in which the synthetic examples "intrude" the data manifolds of the real data, resulting in the conflicts between the synthetic labels and the ground-truth labels of the synthetic examples. The first part of this thesis investigates the strategies for resolving the manifold intrusion problem. We focus on two strategies. The first strategy, which we call "relabelling", attempts to find better labels for the synthetic data; the second strategy, which we call "cautious mixing", carefully selects the interpolating parameters to generate the synthetic examples. Through extensive experiments over several design choices, we observe that the "cautious mixing" strategy appears to perform better.

The second part of this thesis reports a previously unobserved phenomenon in Mixup training: on a number of standard datasets, the performance of the Mixup-trained models starts to decay after training for a large number of epochs, giving rise to a U-shaped generalization curve. This behavior is further aggravated when the size of the original dataset is reduced. To help understand such a behavior of Mixup, we show theoretically that Mixup training may introduce undesired data-dependent label noises to the synthetic data. Via analyzing a least-square regression problem with a random feature model, we explain why noisy labels may cause the U-shaped curve to occur: Mixup improves generalization through fitting the clean patterns at the early training stage, but as training progresses, the model becomes over-fitting to the noise in the synthetic data. Extensive experiments are performed on a variety of benchmark datasets, validating this explanation.

Identiferoai:union.ndltd.org:uottawa.ca/oai:ruor.uottawa.ca:10393/44383
Date14 December 2022
CreatorsLiu, Zixuan
ContributorsMao, Yongyi
PublisherUniversité d'Ottawa / University of Ottawa
Source SetsUniversité d’Ottawa
LanguageEnglish
Detected LanguageEnglish
TypeThesis
Formatapplication/pdf
RightsAttribution-NonCommercial-NoDerivatives 4.0 International, http://creativecommons.org/licenses/by-nc-nd/4.0/

Page generated in 0.0023 seconds