Skip to content

Commit

Permalink
Warn the user that only zh supports coarse tokenization
Browse files Browse the repository at this point in the history
  • Loading branch information
hankcs committed May 4, 2022
1 parent 2d5aba2 commit 94b88c0
Show file tree
Hide file tree
Showing 2 changed files with 4 additions and 1 deletion.
3 changes: 3 additions & 0 deletions plugins/hanlp_restful/hanlp_restful/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -294,6 +294,9 @@ def tokenize(self, text: Union[str, List[str]], coarse: Optional[bool] = None, l
['2021', '年', 'HanLPv2.1', '为', '生产', '环境', '带来', '次世代', '最', '先进的',
'多', '语种', 'NLP', '技术', '。']]
"""
language = language or self._language
if coarse and language and language != 'zh':
raise NotImplementedError(f'Coarse tokenization not supported for {language}. Please set language="zh".')
doc = self.parse(text=text, tasks='tok/coarse' if coarse is True else 'tok', language=language)
return next(iter(doc.values()))

Expand Down
2 changes: 1 addition & 1 deletion plugins/hanlp_restful/setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@

setup(
name='hanlp_restful',
version='0.0.15',
version='0.0.16',
description='HanLP: Han Language Processing',
long_description=long_description,
long_description_content_type="text/markdown",
Expand Down

0 comments on commit 94b88c0

Please sign in to comment.