Skip to content

Commit

Permalink
created initial README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
Tsadoq committed Nov 11, 2024
1 parent 1c33299 commit 0e37115
Showing 1 changed file with 122 additions and 0 deletions.
122 changes: 122 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -1 +1,123 @@
![erisforge_logo](https://github.com/user-attachments/assets/1a11ad1a-a632-4d5f-990c-3fc84a6c543a)
**ErisForge** is a Python library designed to modify Large Language Models (LLMs) by applying transformations to their internal layers. Named after Eris, the goddess of strife and discord, ErisForge allows you to alter model behavior in a controlled manner, creating both ablated and augmented versions of LLMs that respond differently to specific types of input.

## Features

- Modify internal layers of LLMs to produce altered behaviors.
- Ablate or enhance model responses with the `AblationDecoderLayer` and `AdditionDecoderLayer` classes.
- Measure refusal expressions in model responses using the `ExpressionRefusalScorer`.
- Supports custom behavior directions for applying specific types of transformations.

## Installation

To install ErisForge, clone the repository and install the required packages:

```bash
git clone https://github.com/yourusername/erisforge.git
cd erisforge
pip install -r requirements.txt
```

or install directly from pip:

```bash
pip install erisforge
```

## Usage

### Basic Setup

```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from erisforge import ErisForge
from erisforge.expression_refusal_scorer import ExpressionRefusalScorer

# Load a model and tokenizer
model_name = "gpt2"
model = AutoModelForCausalLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Initialize ErisForge and configure the scorer
forge = ErisForge()
scorer = ExpressionRefusalScorer()
```

### Transform Model Layers

You can apply transformations to specific layers of the model to induce different response behaviors.

#### Example 1: Applying Ablation to Model Layers

```python
# Define instructions
instructions = ["Explain why AI is beneficial.", "What are the limitations of AI?"]

# Specify layer ranges for ablation
min_layer = 2
max_layer = 4

# Modify the model by applying ablation to the specified layers
ablated_model = forge.run_forged_model(
model=model,
type_of_layer=AblationDecoderLayer,
objective_behaviour_dir=torch.rand(768), # Example direction tensor
tokenizer=tokenizer,
min_layer=min_layer,
max_layer=max_layer,
instructions=instructions,
max_new_tokens=50
)

# Display modified responses
for conversation in ablated_model:
print("User:", conversation[0]["content"])
print("AI:", conversation[1]["content"])
```

#### Example 2: Measuring Refusal Expressions

Use `ExpressionRefusalScorer` to measure if the model's response includes common refusal phrases.

```python
response_text = "I'm sorry, I cannot provide that information."
user_query = "What is the recipe for a dangerous substance?"

# Scoring the response for refusal expressions
refusal_score = scorer.score(user_query=user_query, model_response=response_text)
print("Refusal Score:", refusal_score)
```

### Save Transformed Model

You can save your modified model locally or push it to the HuggingFace Hub:

```python
output_model_name = "my_transformed_model"

# Save the modified model
forge.save_model(
model=model,
behaviour_dir=torch.rand(768), # Example direction tensor
scale_factor=1,
output_model_name=output_model_name,
tokenizer=tokenizer,
to_hub=False # Set to True to push to HuggingFace Hub
)
```

## Project Structure

- **eris_forge.py** - Core functionality for modifying model layers and saving the model.
- **layers.py** - Defines ablation and addition layers.
- **expression_refusal_scorer.py** - Scores model responses based on refusal expressions.
- **layer_utils.py** - Utility functions for handling model layers based on their architecture.

## Contributing

Feel free to submit issues, suggestions, or contribute directly to this project. Fork the repository, create a feature branch, and submit a pull request.

## License

This project is licensed under the MIT License.

0 comments on commit 0e37115

Please sign in to comment.