Reweighing auxiliary losses in supervised learning

Publication
In AAAI 2023

Apart from the standard supervised learning using hard labels, often auxiliary losses are used in many supervised learning settings to improve the model’s generalisation. For example, knowledge distillation adds a second, teacher mimicking loss to the training of a model, where the teacher may be a pretrained model that outputs a richer distribution over labels. Similarly, in settings with limited labelled data, weak labelling information is used in form of labelling functions. Auxiliary losses are introduced here to combat labelling functions that may be noisy rule-based approximations of true labels. We tackle the problem of learning to combine these losses in a principled manner. We introduce AMAL which learns instance-specific weights using meta learning on a validation metric to achieve optimal mixing of losses. Experiments in a number of knowledge distillation and rule denoising domains show that AMAL provides noticeable gains over competitive baselines in those domains. We empirically analyze our method and share insights into the mechanisms through which it provides performance gains.

Ayush Maheshwari
Ayush Maheshwari
(Now at Vizzhy Inc.) Grad student in NLP/ML at

My research interests include machine learning, NLP and machine translation.