Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error: Incorrect padding #5

Open
virus188 opened this issue Jun 21, 2024 · 5 comments
Open

Error: Incorrect padding #5

virus188 opened this issue Jun 21, 2024 · 5 comments

Comments

@virus188
Copy link

ERROR in the tokenizer Cell

@wdndev
Copy link
Owner

wdndev commented Jun 21, 2024

你可以试试这个:https://colab.research.google.com/drive/11MQb8Bn4Ck707VEcqqGVdytqOk3OrQQK?usp=sharing
直接运行就行,需要去外网

@shenyewei
Copy link

shenyewei commented Jun 22, 2024

但是为什么本地就报错, Incorrect padding

@wdndev
Copy link
Owner

wdndev commented Jun 23, 2024

检查一下是不是tiktoken版本的问题,我暂时没有遇到

@KeepFaithMe
Copy link

你好,想请问一下,这种大模型能够像小模型一样添加自己设计的模块吗?

@wdndev
Copy link
Owner

wdndev commented Aug 12, 2024

添加是可以添加,主要是训练是难点,需要的显存太多了。所以基本就是两种方法添加模块:

  1. lora算法,冻结主干网络,在模型的线性层添加一些旁路参数;
  2. 类似RM模型的方法,冻结主干网络,替换lm_head,比如做分类之类的任务,只训练lm_head,但是这样显存要的也不小。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants