GitHub

Introduction

Code for "CODECIPHER: LEARNING TO OBFUSCATE SOURCE CODE AGAINST LLMS". CodeCipher is a novel method that protects code privacy while preserving the original responses from large language models (LLMs). It achieves this by transforming the LLM's embedding matrix, such that each row corresponds to a different word in the original matrix. This transformation creates a token-to-token confusion mapping, effectively obfuscating the source code. The new embedding matrix is optimized by minimizing a task-specific loss function.

Repository Structure

├── data # training data
├── data_loader_gen.py # data loader for code generation
├── data_loader.py # data loader for code summarization
├── data_loader_translate.py # data loader for code translation
├── human-eval
├── modeling_llama.py # model for LLM
├── README.md
├── requirements.txt
├── train_matrix_iter_gen.py # code for running code completion task
├── train_matrix_iter_summary.py # code for running code summarization task
├── train_matrix_iter_translate.py # code for running code translation task
└── utils # utility functions

Dataset

The trainning data for code summary can be downloaded from here. Others is in the data folder.

Environment

pip install requirements.txt
pip install -e human-eval

Run

To run the code, you can execute the following commands. A detailed list of hyperparameters can be found in Appendix C of the paper.

python train_matrix_iter_gen.py # code for code completion
python train_matrix_iter_summary.py # code for summarization
python train_matrix_iter_translate.py # code for translation

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Introduction

Repository Structure

Dataset

Environment

Run

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
data		data
human-eval		human-eval
result		result
utils		utils
.gitignore		.gitignore
README.md		README.md
data_loader.py		data_loader.py
data_loader_gen.py		data_loader_gen.py
data_loader_translate.py		data_loader_translate.py
modeling_llama.py		modeling_llama.py
requirements-replicate.txt		requirements-replicate.txt
requirements.txt		requirements.txt
train_matrix_iter_gen.py		train_matrix_iter_gen.py
train_matrix_iter_summary.py		train_matrix_iter_summary.py
train_matrix_iter_translate.py		train_matrix_iter_translate.py

bzz/CodeCipher

Folders and files

Latest commit

History

Repository files navigation

Introduction

Repository Structure

Dataset

Environment

Run

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages