-
Notifications
You must be signed in to change notification settings - Fork 83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
请教下:评测判断时用instructGPT+prompt和用这些数据finetune分类模型,哪个评测的相关性更高,有对比数据不 #2
Comments
我们其实也是针对模型的生成做安全评判,也会有优势 |
不好意思,应该是我描述的不太准确,如果是对生成的总体质量评测,因为涉及流畅、事实、一致性类的指标不太好衡量,所以有优势; 或者说如果有一个类似perspectiveAPI的判别器,只考虑效果的话,prompt+LLM会更有优势吗? |
安全本身的定义比较模糊复杂,场景多样,所以可能不像普通的分类任务(例如情感极性二分类)那么简单,或者说难以明确地建模为简单的分类问题。而且安全会涉及到一些知识,LLM也会更有优势。 |
嗯嗯确实,学习到了,多谢多谢 不过看论文,如果没理解错的话,评测是在定义好的13种安全类型上分别做二分类 |
关于这一点我们有在做更细致的实验,可以关注我们的未来的工作 |
看文章里引用的几篇用LLM做评测的论文,好像都是针对生成比较有优势,像这钟通用领域的安全性判别问题,也会有优势吗
The text was updated successfully, but these errors were encountered: