A barebones set of utilities that help run few-shot prompting using hf
Currently, supports few-shot classification (through generation) along with generating logprobs for each class.
This is meant to be as lightweight as possible, and is not meant to be a full-fledged library.
For running a task, 4 additional files are required. example_task contains the setup for running a simple question answering task. A brief description of the 4 files is as follows :
-
A
prompt.json
should contain prompts for generation. Each prompt should be in the following format{ "prompt_name": { #zero_shot: The initial prompt along with the input variables. In case of zero shot classification, this is the only prompt that will be needed. Example : "zero_shot": "You are an expert in US history. Given a question about a US political figurem, answer it with a short paragraph.\n\nQUESTION: {question}\nANSWER: ", # followup: A followup prompt that fixes format for exemplars. Example: "followup": "QUESTION: {question}\nANSWER: ", } }
For a more involved example of a prompt, see examples/sample_prompt.json.
-
An
exemplars.jsonl
should contain exemplars in jsonlines format with the input and output variables.{"input": "Who was the first president of the United States?","output": "George Washington"} ...
See examples/sample_exemplars.jsonl for a sample file.
-
A
dataset.jsonl
should contain the dataset in jsonlines format with the input (and optionally output variables). NOTE: Each dataset item must have a key, to be specified underdataset
->id_key
inconfig.json
{"q_id": "1001", "input": "Who was the second president of the United States?", "output": "John Adams"} ...
See examples/sample_dataset.jsonl for a sample file.
-
A
config.yml
file should contain various hyperparameters and data paths in a yaml parsable format. See configs/example_config.yml for a sample file.
See the create_task subdirectory for convenient code that will create these files automatically for you from CSV and text files, which you may find more convenient.
Create an empty conda enironment:
conda create -n hf-fewshot python=3.11
conda activate hf-fewshot
After cloning the repository, run the following commands:
# first create an empty conda environment
conda create -n hf_fewshot python=3.11
cd hf-fewshot
pip install -e .
-
If using OpenAI, add the OpenAI API key to your environment variables as
OPENAI_API_KEY
. If using conda, you can do this by runningconda env config vars set OPENAI_API_KEY=<your key>
. -
If using a huggingface "Gated" model (such as Llama 3), add the Huggingface access token to an environment variable called
HF_TOKEN
by runningconda env config vars set HF_TOKEN=<your token>
.
To run fewshot prompting, run the following command :
hf_fewshot --config configs/example_config.yml
If output_file
is specified in the config, it will be saved to that location. Otherwise, a file will be automatically generated in that directory prior to saving.