Encoding issues #183

arturgontijo · 2019-01-22T10:11:26Z

If I run snet in a Docker container I get an encoding error:

# snet organization info ramonduraes
Error: 'ascii' codec can't encode character '\xe3' in position 32: ordinal not in range(128)
If you want to see full Traceback then run:
snet --print-traceback [parameters]

# locale
LANG=
LANGUAGE=
LC_CTYPE="POSIX"
LC_NUMERIC="POSIX"
LC_TIME="POSIX"
LC_COLLATE="POSIX"
LC_MONETARY="POSIX"
LC_MESSAGES="POSIX"
LC_PAPER="POSIX"
LC_NAME="POSIX"
LC_ADDRESS="POSIX"
LC_TELEPHONE="POSIX"
LC_MEASUREMENT="POSIX"
LC_IDENTIFICATION="POSIX"
LC_ALL=

I can set locale in Dockerfile and also force it within python:

# PYTHONIOENCODING=utf-8 snet organization info ramonduraes

Organization Name:
 - Ramon Durães

Organization Id:
 - ramonduraes

Owner:
 - 0xCEb196e0236C5B4EE62d5C87692284FBd52fCBD0

I'd like to suggest an UnicodeEncodeError handler.

One approach is using stdout.buffer, with it snet will become console's encoding independent.
Example commands.py#L39:

    def _printout(self, message):
        if self.out_f is not None:
                try:
                        print(message, file=self.out_f)
                except UnicodeEncodeError:
                        sys.stdout.buffer.write(message.encode("utf-8"))

The text was updated successfully, but these errors were encountered:

astroseger · 2019-01-22T10:29:52Z

Looks fine for me... But it should be sys.stdout.buffer.write((message + "\n").encode("utf-8")) or something like this (we need add \n)..

astroseger · 2019-01-22T16:35:16Z

was fixed by #184

ferrouswheel · 2019-01-22T21:18:08Z

I am concerned we are introducing a problem for people. If the console doesn't support UTF-8, we shouldn't force it. I think the correct option is to configure the terminal correctly.

One way to avoid error would be to use message.encode('ascii', 'ignore'), while not forcing utf-8 encoding on a terminal that doesn't support it. This throws away characters that can't be represented.

A better way to avoid error is to use unidecode to replace undisplayable unicode characters with similar ascii characters.

arturgontijo · 2019-01-22T22:15:34Z

I am concerned we are introducing a problem for people. If the console doesn't support UTF-8, we shouldn't force it. I think the correct option is to configure the terminal correctly.

Like pointing this out for users in the README?

One way to avoid error would be to use message.encode('ascii', 'ignore'), while not forcing utf-8 encoding on a terminal that doesn't support it. This throws away characters that can't be represented.

Using "ignore" is fine but I think we also should try "utf-8" before that.
Like happened to me in my Docker container, I've got "utf-8" with the new PR.
With "ignore" it won't break but, in my case, I'll get empty chars while my console does support "utf-8".
What do you think?

A better way to avoid error is to use unidecode to replace undisplayable unicode characters with similar ascii characters.

This should be more elegant but I think we must try to use "utf-8" before that too.

ferrouswheel · 2019-01-22T22:28:20Z

@arturgontijo I think we should try to print utf-8 encoding by default, but in the exception handling choose one of:

use ignore
use unidecode
print a message saying they should set their terminal to UTF-8, and then exit
on the first exception, print a message saying they should set their terminal to UTF-8, then for future exceptions use ignore/unidecode.

A complete solution would be to inspect the terminal environment to decide the correct encoding, but I'm not sure how easy/complex those settings are to get right.

astroseger · 2019-01-23T10:44:06Z

yes. It seems we hurried with the decision. I would vote for the following solution: Intercept "UnicodeEncodeError" and do the following:

print a warning that terminal do not support utf-8 (in stderr with # prefix)
print message.encode('ascii', 'ignore'))

astroseger · 2019-01-23T10:47:14Z

What will happen if we force utf-8 characters on terminal which do not support utf-8? Maybe it was not a bad decision after all? We could force utf-8 but print a warning..

arturgontijo · 2019-01-23T11:12:35Z

I still think that we must "force" the utf-8 too.
A lot of modern python tools rely on that premise (user's console supports it).
If we dive into this the code will have multiple exceptions, example:

Console 1 (does support utf-8 but it is not set)
Console 2 (doesn't support utf-8 at all)

    @staticmethod
    def _print(message, fd):
        message = str(message) + "\n"
        try:
            fd.write(message)
        except UnicodeEncodeError: # <--- Will rise for 1 and 2
            if hasattr(fd, "buffer"):
                fd.buffer.write(message.encode("utf-8")) # <---- Will trow another exception just for 2, right?
            else:
                raise

What will happen if we force utf-8 characters on terminal which do not support utf-8?

I'm kind of lost here, because I can't figure out how to simulate it.

ferrouswheel · 2019-01-23T20:51:01Z

What will happen if we force utf-8 characters on terminal which do not support utf-8?

I'm kind of lost here, because I can't figure out how to simulate it.

I imagine that you'd get some unexpected characters or junk. But I'm unsure.

I was thinking of something like:

    utf8_warning_printed = False
    @staticmethod
    def _print(message, fd):
        try:
            fd.buffer.write(str(message).encode("utf-8") + "\n")
        except UnicodeEncodeError:
            if not utf8_warning_printed:
                fd.buffer.write("Error attempting to encode unicode! Please ensure your locale supports UTF-8 if you want to see correct character representations.")
                utf8_warning_printed = True
            fd.write(str(message).encode("ascii", "ignore") + "\n")

For "Console 1 (does support utf-8 but it is not set)" - I'm pretty sure we can't detect or fix this if the user has the wrong setting. So I think we should just warn them...

If we start trying to be clever I think we'll make things much harder to debug for ourselves (and for our users trying to configure things!).

arturgontijo · 2019-01-24T11:25:56Z

Ok, I think I've got the idea.

    @staticmethod
    def _print(message, fd):
        message = str(message) + "\n"
        try:
            fd.write(message)
        except UnicodeEncodeError:
            fd.write("Error attempting to encode unicode! Please ensure your locale supports UTF-8 if you want to see correct character representations.")
            message = message.encode("ascii", "ignore")
            fd.write(message.decode())

I'm not using utf8_warning_printed because snet-cli runs and exits at once.

About the warning, maybe we can show:
"Error attempting to encode unicode! Please ensure your locale is set to support UTF-8 if you want to see correct character representations."

arturgontijo mentioned this issue Jan 22, 2019

[fix] UnicodeEncodeError handler. #184

Merged

astroseger closed this as completed Jan 22, 2019

arturgontijo reopened this Jan 22, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Encoding issues #183

Encoding issues #183

arturgontijo commented Jan 22, 2019

astroseger commented Jan 22, 2019

astroseger commented Jan 22, 2019

ferrouswheel commented Jan 22, 2019

arturgontijo commented Jan 22, 2019

ferrouswheel commented Jan 22, 2019 •

edited

Loading

astroseger commented Jan 23, 2019

astroseger commented Jan 23, 2019

arturgontijo commented Jan 23, 2019

ferrouswheel commented Jan 23, 2019

arturgontijo commented Jan 24, 2019

Encoding issues #183

Encoding issues #183

Comments

arturgontijo commented Jan 22, 2019

astroseger commented Jan 22, 2019

astroseger commented Jan 22, 2019

ferrouswheel commented Jan 22, 2019

arturgontijo commented Jan 22, 2019

ferrouswheel commented Jan 22, 2019 • edited Loading

astroseger commented Jan 23, 2019

astroseger commented Jan 23, 2019

arturgontijo commented Jan 23, 2019

ferrouswheel commented Jan 23, 2019

arturgontijo commented Jan 24, 2019

ferrouswheel commented Jan 22, 2019 •

edited

Loading