group name | description | rule / prompt |
---|---|---|
default | rules for text quality check | RuleColonEnd RuleContentNull RuleDocRepeat RuleHtmlEntity RuleIDCard RuleNoPunc RuleSpecialCharacter |
sft | rules for sft dataset check | RuleColonEnd RuleContentNull RuleDocRepeat RuleHtmlEntity RuleNoPunc RuleSpecialCharacter RuleLineStartWithBulletpoint |
pretrain | rules for pretrain dataset check | RuleAlphaWords RuleCapitalWords RuleCharNumber RuleColonEnd RuleContentNull RuleDocRepeat RuleHtmlEntity RuleIDCard RuleLineEndWithEllipsis RuleLineEndWithTerminal RuleLineStartWithBulletpoint RuleLineJavascriptCount RuleLoremIpsum RuleMeanWordLength RuleNoPunc RuleSentenceNumber RuleSpecialCharacter RuleStopWord RuleSymbolWordRatio RuleUniqueWords RuleWordNumber |