Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

serialize() vs print() gotchas with UTF-8 #123

Open
7Two1 opened this issue Jul 8, 2013 · 0 comments
Open

serialize() vs print() gotchas with UTF-8 #123

7Two1 opened this issue Jul 8, 2013 · 0 comments
Labels
Triage Issues yet to be triaged

Comments

@7Two1
Copy link

7Two1 commented Jul 8, 2013

Rhino (git 5f1ad11) (latest as of 7 July 2013) ...

serialize() munges random UTF-8 chars on both Windows and Centos. (i.e. the […] ellipsis worked, the horizontal [-] failed .. this is not a minus sign. Therefore, for me, serialize() is not trustworthy)

My workaround was to use Centos and redirection in order to save processed data using print()

in my stripped down as an example wrapper.js file...

...
var infile = arguments[0];
serialize(readFile(infile, 'UTF-8'), 'output_filename');
...

java -jar rhinojar ./wrapper.js named_file_containing_utf-8_chars

fails

However,
...
var infile = arguments[0];
print(readFile(infile, 'UTF-8'));
...

java -jar rhinojar ./wrapper.js named_file_containing_utf-8_chars > output_filename

works

... as it turned out, serialize() will also output a BOM which I didn't want. Therefore, print() won out, however, due to Windows cmd limitations, redirection of print() data containing utf-8 will fail without some effort. It was easier to install Java and Ant under Centos instead of chasing after Windows peculiarities.

Hope this report helps.

@p-bakker p-bakker added the Triage Issues yet to be triaged label Jul 5, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Triage Issues yet to be triaged
Projects
None yet
Development

No branches or pull requests

2 participants