Stars
Code for the NeurIPS 2024 submission: "DAGER: Extracting Text from Gradients with Language Model Priors"
Papers and resources related to the security and privacy of LLMs 🤖
The official implementation of our pre-print paper "Automatic and Universal Prompt Injection Attacks against Large Language Models".
A Synthetic Dataset for Personal Attribute Inference (NeurIPS'24 D&B)
[arXiv:2411.10023] "Model Inversion Attacks: A Survey of Approaches and Countermeasures"
Code for replicating experiments in our paper (accepted by AAAI-24).
[IJCAI-2021] Contrastive Model Inversion for Data-Free Knowledge Distillation
A reading list for large models safety, security, and privacy (including Awesome LLM Security, Safety, etc.).
LAMP: Extracting Text from Gradients with Language Model Priors (NeurIPS '22)
Repo for arXiv preprint "Gradient-based Adversarial Attacks against Text Transformers"
Learning Sparse Neural Networks through L0 regularization
The repository contains the code for analysing the leakage of personally identifiable (PII) information from the output of next word prediction language models.
This project explores training data extraction attacks on the LLaMa 7B, GPT-2XL, and GPT-2-IMDB models to discover memorized content using perplexity, perturbation scoring metrics, and large scale …
😜Constrative Learning of Sentence Embedding using LoRA (EECS487 final project)
Official Code for ACL 2023 paper: "Ethicist: Targeted Training Data Extraction Through Loss Smoothed Soft Prompting and Calibrated Confidence Estimation"
🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.
Curated list of project-based tutorials
本项目旨在分享大模型相关技术原理以及实战经验(大模型工程化、大模型应用落地)
[NAACL2024] Attacks, Defenses and Evaluations for LLM Conversation Safety: A Survey
A curated list of safety-related papers, articles, and resources focused on Large Language Models (LLMs). This repository aims to provide researchers, practitioners, and enthusiasts with insights i…
Code for Findings-ACL 2023 paper: Sentence Embedding Leaks More Information than You Expect: Generative Embedding Inversion Attack to Recover the Whole Sentence
🐙 Guides, papers, lecture, notebooks and resources for prompt engineering
An easy-to-use federated learning platform
Code for loralib, an implementation of "LoRA: Low-Rank Adaptation of Large Language Models"