Skip to content
View SJSoJSooJ's full-sized avatar

Block or report SJSoJSooJ

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Robust recipes to align language models with human and AI preferences

Python 5,072 436 Updated Nov 21, 2024

PyTorch implementation of adversarial attacks [torchattacks]

Python 1,973 359 Updated Jun 29, 2024

HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal

Jupyter Notebook 581 80 Updated Aug 16, 2024

Universal and Transferable Attacks on Aligned Language Models

Python 3,785 510 Updated Aug 2, 2024
Python 1 Updated Jun 18, 2024