Open
Description
I need to know how to use <filename>
, <fim_*>
and other special tokens listed in tokenizer special_tokens_map when preparing the dataset.
I've been successfully able to finetune Starcoder on my own code, but I haven't specially prepared the dataset for FIM, so I feel the result could be inferior, as the VSCode extension uses FIM.
Metadata
Metadata
Assignees
Labels
No labels