SJSoJSooJ

Follow

Seungwon Jeong SJSoJSooJ

Follow

Stars

huggingface / alignment-handbook

Robust recipes to align language models with human and AI preferences

Python 5,072 436 Updated Nov 21, 2024

Harry24k / adversarial-attacks-pytorch

PyTorch implementation of adversarial attacks [torchattacks]

Python 1,973 359 Updated Jun 29, 2024

centerforaisafety / HarmBench

HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal

Jupyter Notebook 581 80 Updated Aug 16, 2024

llm-attacks / llm-attacks

Universal and Transferable Attacks on Aligned Language Models

Python 3,785 510 Updated Aug 2, 2024

CSID-DGU / 2024-1-OSS-team-2-moeum

Python 1 Updated Jun 18, 2024