Skip to content

The repository contains the code for the research paper "Beyond Binary Classification: Customizable Text Watermark on Large Language Models".

License

Notifications You must be signed in to change notification settings

Arrtourz/Customizable-Text-Watermark-on-LLM

Repository files navigation

Beyond Binary Classification: Customizable Text Watermark on Large Language Models

This repository contains the implementation of the research presented in the paper "Beyond Binary Classification: Customizable Text Watermark on Large Language Models". The code provides a method to embed and detect watermarks in texts generated by large language models (LLMs), enabling a new approach to text authentication and origin verification.

Repository Structure

The repository is structured as follows:

  • gpt_logp.py: Implements the gpt_logp class, which interacts with the OpenAI API to obtain log probabilities for given text inputs.
  • signal_processing.py: Contains functions related to signal processing, including generate_qpsk_signal and decode_to_ascii, which are crucial for the watermark embedding and extraction processes.
  • text_generation.py: Provides functionalities to generate text with or without specific patterns using OpenAI's GPT models.
  • text_analysis.py: Includes functions for analyzing text, such as calculating perplexity and analyzing text watermarks.
  • main.py: The main script that demonstrates how to use the individual modules together.
  • demo.ipynb: A Jupyter notebook that serves as a live demonstration of the watermarking process.

Demo Notebook

demo.ipynb is a Jupyter notebook that provides a step-by-step demonstration of the watermark embedding and extraction process. To run the demos, ensure that you have Jupyter Notebook installed and simply open the demo.ipynb file in a Jupyter environment.

Installation

To run the code in this repository, you will need Python 3.7 or later. Clone the repository and install the required dependencies:

git clone https://github.com/Arrrtour/text-watermark-llm.git
cd text-watermark-llm
pip install -r requirements.txt

Usage

To use the code, first set up your environment with the necessary API keys and configurations. Then you can run main.py to see a basic example of the watermarking process, or explore demo.ipynb for an interactive experience.

python main.py

Or, for an interactive demo:

jupyter notebook demo.ipynb

Contributing

Contributions to this project are welcome! Please read our contributing guidelines for more information on how to report issues, submit changes, and contribute to the code.

License

This project is licensed under the MIT License.

Citation

If you use this code or our methodology in your research, please cite our paper:

Xu, Z., Xu, R., & Sheng, V. S. (2024, June). Beyond Binary Classification: Customizable Text Watermark on Large Language Models. In 2024 International Joint Conference on Neural Networks (IJCNN) (pp. 1-8). IEEE.

And

Xu, Z., & Sheng, V. S. (2024). Signal Watermark on Large Language Models. arXiv preprint arXiv:2410.06545.

Contact

For any inquiries, please open an issue in this repository or contact us directly at [[email protected]].

About

The repository contains the code for the research paper "Beyond Binary Classification: Customizable Text Watermark on Large Language Models".

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published