[Bug]: Encoding error in the case which use Japanese texts in simple-labelled.py #17846

meniyama5319 · 2025-02-17T22:43:14Z

Bug Description

Hi,
Thank you so much to develop and maintain the great library.
I faced encoding error in property-graph rag indexing process when using Japanese language.
And fixed it by adding "encoding = 'utf-8' " two code below on \llama_index\core\graph_stores\simple_labelled.py

'with fs.open(persist_path, "w") as f:'
'with fs.open(persist_path, "r") as f:'

I'm very happy if you modify your original code because I don't need to change it anymore after pulling your code!!

Thank you,

Version

0.12.10

Steps to Reproduce

create a property graph index using many Japanese documents.
then happened encoding error when writing or reading the index json.

Relevant Logs/Tracbacks

dosubot · 2025-02-17T22:46:46Z

Hey @meniyama5319,

I'm here to help you with any issues you encounter. Whether it's solving bugs, answering questions, or becoming a contributor, I've got your back. Let's tackle this together!

The issue you're experiencing with encoding errors when using Japanese text in the simple_labelled.py file can be resolved by explicitly setting the encoding to 'utf-8' in the file operations. You can modify the code as follows:

Change the line with fs.open(persist_path, "w") as f: to with fs.open(persist_path, "w", encoding="utf-8") as f:.
Change the line with fs.open(persist_path, "r") as f: to with fs.open(persist_path, "r", encoding="utf-8") as f:.

This change ensures that the file is read and written using UTF-8 encoding, which should handle Japanese characters correctly. This modification aligns with the default encoding used in other parts of the LlamaIndex library, such as the SimpleDirectoryReader class, which also uses UTF-8 encoding ^[1].

To continue talking to Dosu, mention @dosu.

^{Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Bug Report | Other}

meniyama5319 added bug Something isn't working triage Issue needs to be triaged/prioritized labels Feb 17, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Encoding error in the case which use Japanese texts in simple-labelled.py #17846

[Bug]: Encoding error in the case which use Japanese texts in simple-labelled.py #17846

meniyama5319 commented Feb 17, 2025

dosubot bot commented Feb 17, 2025

[Bug]: Encoding error in the case which use Japanese texts in simple-labelled.py #17846

[Bug]: Encoding error in the case which use Japanese texts in simple-labelled.py #17846

Comments

meniyama5319 commented Feb 17, 2025

Bug Description

Version

Steps to Reproduce

Relevant Logs/Tracbacks

dosubot bot commented Feb 17, 2025