Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

query-work set with a dict freezes in declaration #40

Open
yipcma opened this issue Aug 6, 2016 · 1 comment
Open

query-work set with a dict freezes in declaration #40

yipcma opened this issue Aug 6, 2016 · 1 comment

Comments

@yipcma
Copy link

yipcma commented Aug 6, 2016

segmenter <- worker(type = "query", dict = "dict/scel.dict.utf8")

When this line is run, R freezes. I've run other kinds of workers specifying user instead of dict and no problems occur.

Would you kindly illustrate the difference in specifying user and dict?

Also, could you reproduce the bug?

My jeibaR library is the development version here; on ubuntu 14.04.

Thank you very much.

@qinwf
Copy link
Owner

qinwf commented Aug 8, 2016

Sorry, can not reproduce it.

My code:

> writeLines(c("测试jia 10 v","北京 11 n"),con = "../../test.dict")
> cc = worker("query", dict = "../../test.dict")
> cc["测试jia"]
[1] "测试jia"

On Ubuntu 16.04 R 3.1.1 from GitHub.

There are docs about dicts. You can check out the docs site ?edit_dict. I will update help page for worker function to mention this.

There are three column in the system dictionary. The first column is the word, and the second column is the frequency of word. The third column is speech tag using labels compatible with ictclas.

There are two column in the user dictionary. The first column is the word, and the second column is speech tag using labels compatible with ictclas. Frequency of every word in the user dictionary is set by user_weight in worker function. If you want to provide the frequency of a new word, you can put it in the system dictionary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants