Stars
Official github page for the paper "Evaluating Deep Unlearning in Large Language Model"
Safe Unlearning: A Surprisingly Effective and Generalizable Solution to Defend Against Jailbreak Attacks
[ACL 2024] An Easy-to-use Knowledge Editing Framework for LLMs.
New ways of breaking app-integrated LLMs
An easy-to-use Python framework to generate adversarial jailbreak prompts.
Package to optimize Adversarial Attacks against (Large) Language Models with Varied Objectives
[EMNLP 2023] MQuAKE: Assessing Knowledge Editing in Language Models via Multi-Hop Questions
Universal and Transferable Attacks on Aligned Language Models
A PyTorch implementation of Model Agnostic Meta-Learning (MAML) that faithfully reproduces the results from the original paper.
A new adversarial purification method that uses the forward and reverse processes of diffusion models to remove adversarial perturbations.
Official Code for ICLR2022 Paper: Chaos is a Ladder: A New Theoretical Understanding of Contrastive Learning via Augmentation Overlap
A loss function (Weighted Hausdorff Distance) for object localization in PyTorch
Codes for NeurIPS 2020 paper "Adversarial Weight Perturbation Helps Robust Generalization"
A PyTorch implementation of the method found in "Adversarially Robust Few-Shot Learning: A Meta-Learning Approach"
official code for dynamic convolution decomposition
Code and data for the ICLR 2021 paper "Perceptual Adversarial Robustness: Defense Against Unseen Threat Models".
Unofficial implementation of the DeepMind papers "Uncovering the Limits of Adversarial Training against Norm-Bounded Adversarial Examples" & "Fixing Data Augmentation to Improve Adversarial Robustn…
Empirical tricks for training robust models (ICLR 2021)
Code relative to "Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks"
[ICML 2021] This is the official github repo for training L_inf dist nets with high certified accuracy.
A Python package to assess and improve fairness of machine learning models.
This repository contains implementations and illustrative code to accompany DeepMind publications
Systematic Evaluation of Membership Inference Privacy Risks of Machine Learning Models