Skip to content
View dash29's full-sized avatar

Block or report dash29

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Official github page for the paper "Evaluating Deep Unlearning in Large Language Model"

Jupyter Notebook 14 1 Updated Feb 15, 2025
Jupyter Notebook 18 3 Updated Jun 16, 2024
Python 15 7 Updated Dec 10, 2022

Safe Unlearning: A Surprisingly Effective and Generalizable Solution to Defend Against Jailbreak Attacks

Python 25 1 Updated Jul 9, 2024

[ACL 2024] An Easy-to-use Knowledge Editing Framework for LLMs.

Jupyter Notebook 2,139 263 Updated Mar 18, 2025

New ways of breaking app-integrated LLMs

Jupyter Notebook 1,903 130 Updated Jun 17, 2023

An easy-to-use Python framework to generate adversarial jailbreak prompts.

Python 590 48 Updated Sep 2, 2024

Package to optimize Adversarial Attacks against (Large) Language Models with Varied Objectives

Python 67 6 Updated Feb 22, 2024

[EMNLP 2023] MQuAKE: Assessing Knowledge Editing in Language Models via Multi-Hop Questions

Jupyter Notebook 107 11 Updated Sep 12, 2024

Universal and Transferable Attacks on Aligned Language Models

Python 3,782 510 Updated Aug 2, 2024

A PyTorch implementation of Model Agnostic Meta-Learning (MAML) that faithfully reproduces the results from the original paper.

Python 223 34 Updated Sep 2, 2024

A new adversarial purification method that uses the forward and reverse processes of diffusion models to remove adversarial perturbations.

Python 288 34 Updated Jan 29, 2023

Official Code for ICLR2022 Paper: Chaos is a Ladder: A New Theoretical Understanding of Contrastive Learning via Augmentation Overlap

Jupyter Notebook 28 3 Updated Sep 2, 2022

A loss function (Weighted Hausdorff Distance) for object localization in PyTorch

Python 90 22 Updated Jun 29, 2018

Codes for NeurIPS 2020 paper "Adversarial Weight Perturbation Helps Robust Generalization"

Python 177 19 Updated Feb 18, 2021
Jupyter Notebook 58 23 Updated Sep 11, 2023

A PyTorch implementation of the method found in "Adversarially Robust Few-Shot Learning: A Meta-Learning Approach"

Python 50 10 Updated Oct 9, 2020

official code for dynamic convolution decomposition

Python 131 15 Updated Nov 22, 2021

Code and data for the ICLR 2021 paper "Perceptual Adversarial Robustness: Defense Against Unseen Threat Models".

Python 55 10 Updated Jan 18, 2022

Unofficial implementation of the DeepMind papers "Uncovering the Limits of Adversarial Training against Norm-Bounded Adversarial Examples" & "Fixing Data Augmentation to Improve Adversarial Robustn…

Python 95 12 Updated Mar 4, 2022

Empirical tricks for training robust models (ICLR 2021)

Python 250 26 Updated May 25, 2023

Code relative to "Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks"

Python 683 116 Updated May 16, 2024

[ICML 2021] This is the official github repo for training L_inf dist nets with high certified accuracy.

Python 41 7 Updated Mar 16, 2022

A Python package to assess and improve fairness of machine learning models.

Python 2,034 452 Updated Mar 20, 2025
Jupyter Notebook 2 Updated May 8, 2021

This repository contains implementations and illustrative code to accompany DeepMind publications

Jupyter Notebook 13,632 2,644 Updated Nov 18, 2024

Systematic Evaluation of Membership Inference Privacy Risks of Machine Learning Models

Python 125 19 Updated Apr 9, 2024
Next