Improving the Privacy Utility Tradeoff in Differentially Private Machine Learning with Public Data

Ashwinee Panda & Xinyu Tang Google Tech Talks

38:18

0 views

Published May 30, 2023

About this talk

A Google TechTalk, presented by Ashwinee Panda & Xinyu Tang (Princeton University), 2023/03/29 ABSTRACT: Differential privacy (DP) has become the de-facto measure of privacy. By training machine learning models with Differentially Private Stochastic Gradient Descent (DP-SGD), we can provide provable guarantees that the trained model does not leak too much information about its training data. However, DP-SGD can compromise the accuracy of machine learning models because gradient clipping increases bias and adding Gaussian noise increases variance of each gradient update. In this talk we present two algorithms, DP-RAFT and DOPE-SGD, that leverage public data to improve the privacy utility tradeoff in DP-SGD. When ample public data is available to pretrain a model we propose DP-RAFT, a recipe that privately selects the best hyperparameters for fine-tuning to maximize the signal to noise ratio of private updates. In instances where limited public data is available we propose DOPE-SGD, an algorithm that applies advanced data augmentation to enhance the quality of public data and incorporates gradients from (augmented) public data in clipping to reduce the effect of added noise in the privatized gradients.