Machine Unlearning

With the progress of deep learning, there is now widespread use of AI technologies.

With that, comes with the propagation and amplification of biases and breach of user privacy.

What if we could somehow make AI models "forget" the data it was trained on?

Machine Unlearning is a subfield of ML that aims to remove the influence of specific subset of training examples (forget set) from a trained model.

The ideal unlearning algorithm would (1) remove the influence while (2) maintaining accuracy on the rest of training set and generalization to held-out examples.

The naive way is to retrain the model on a new dataset that excludes the samples from the forget set, but this can be computationally expensive.

The ideal algorithm will use the pre-trained model as a starting point, and efficiently make adjustments to remove the influence of the forget set.


MU goes beyond protecting user privacy. It can:

  1. erase inaccurate/outdated information (due to errors in labelling or changes in environment)
  2. remove harmful, manipulated, or outlier data

It's also related to other areas of ML

  • Differential privacy: guarantee no particular training example has too large an influence on model (stronger goal compared to unlearning)
  • life-long learning: models that can learn continuously while maintaining previously-acquired skills
  • fairness: correct unfair biases or disparate treatment of members belonging to different groups


An unlearning algorithm takes a pre-trained model, and one or more samples from the forget set to unlearn.

From the model, forget set, and retain set, the unlearning algorithm produces an unlearned model.

The goal: unlearned model === model trained without forget set


It is complex as it involves several conflicting objectives:

  1. forgetting requested data
  2. maintaining model's utility (accuracy on retained and held-out data)
  3. efficiency

Existing algorithms make different tradeoff

  • full retraining = forget ✅, utility ✅, efficiency ❌
  • adding noise to weights = forget ✅, utility ❌

The second challenge is the inconsistent evaluation.


The first NeurIPS 2023 Machine Unlearning Challenge was announced to advanced this field.

A starting kit and a sample notebook is also released.


How is forgetting evaluated?

Using tools inspired by Membership Inference Attacks (MIAs) such as LiRA.

They were first developed in privacy and security literature with the goal of inferring which examples were part of the training set.

Intuitively, if unlearning was successful, there will be no trace of the forget set, causing MIA to fail; since the attacker will be unable to infer the forget set was part of the original training set.

In addition, distribution of retrained models and unlearned models will be compared. For an ideal unlearned algorithm, the two will be indistinguishable.