-
Notifications
You must be signed in to change notification settings - Fork 108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
输出关键词提取的排序后的所有结果 #27
Comments
@qinwf |
@AlexYoung757 不一定,如果句子的词数不到 4 个,那结果也不会超过 4 个. |
@qinwf 我在关键词提取的时候把topn设置成1000,但是提取的关键词数量会少于分词的数量,这是什么原原因?是不是因为有些分出来的词并不在idf语料库中? |
@cnhzzx ,能给一个例句吗,我重复一下? |
|
谢谢,我可以重现了。 这几个词是单字词,在 upstream 的源码里,单字词和停词在提取时会被跳过,see: jiebaR/inst/include/lib/KeywordExtractor.hpp Lines 113 to 115 in 00446d7
我不是很了解这条规则的具体目的,我先在 jiebaR 的删了这条规则吧。 cc @yanyiwu master 已经更新了,你可以从 GitHub 安装最新版。
“很” 在停词表里 |
@qinwf |
@cnhzzx 取平均值 : jiebaR/inst/include/lib/KeywordExtractor.hpp Lines 93 to 97 in 85e0819
jiebaR/inst/include/lib/KeywordExtractor.hpp Line 177 in 85e0819
|
No description provided.
The text was updated successfully, but these errors were encountered: