-
Notifications
You must be signed in to change notification settings - Fork 78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error creating Japanese NLP Pipeline #80
Comments
Hi @gilliganc , thanks for reporting it. This is probably because we don't have an AveragePerceptronTagger model for Japanese. I'll investigate how to improve this. Meanwhile you can create a "Tokenizer" only pipeline |
thanks i think i need more than the tokenizer as i was trying to port some existing code from python to dotnet that was based around spacy to see if i could improve the performance and integrate it easier. Based on what the person that wrote the original code i need more than the tokeniser. We are trying to detect the keywords in the japanese text and the nouns i don't think just the the tokenizer would help right? |
Is this being worked on? I still have this error. It's definitely the AveragePerceptronTagger (I'm getting NullReferenceException). Does the tokenizer even work properly? Is there a reason this spacy model has been ported without it? The Japanese model is pretty much useless right now if I can't get anything to work. How soon can this be fixed? It looks like spacy haven't used Averaged Percepton Taggers since pre-version 2.0. They now use neural networks (matrix multiplication). Are all the Catalyst models based on APTs? |
@CodeRabbit957 we've not updated the tagger as we're also ourselves not using it anymore in our app... In any case, Catalyst would need to incorporate a proper CJK tokenizer such as https://github.com/leungwensen/cjk-tokenizer to be able to correctly handle Japanese. If you're up for the challenge, PRs are welcome! |
Describe the bug
Trying to load the Pipeline for the Japanese model/language results in a MessagePackSerializationException This is on NET6 on windows 10.
To Reproduce
the second line will error with th exception in the Additional context
Expected behavior
Create the Pipeline without error and be able to perform NLP on japanese text.
Additional context
The text was updated successfully, but these errors were encountered: