Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encoding issues #183

Open
arturgontijo opened this issue Jan 22, 2019 · 10 comments
Open

Encoding issues #183

arturgontijo opened this issue Jan 22, 2019 · 10 comments

Comments

@arturgontijo
Copy link
Contributor

If I run snet in a Docker container I get an encoding error:

# snet organization info ramonduraes
Error: 'ascii' codec can't encode character '\xe3' in position 32: ordinal not in range(128)
If you want to see full Traceback then run:
snet --print-traceback [parameters]

# locale
LANG=
LANGUAGE=
LC_CTYPE="POSIX"
LC_NUMERIC="POSIX"
LC_TIME="POSIX"
LC_COLLATE="POSIX"
LC_MONETARY="POSIX"
LC_MESSAGES="POSIX"
LC_PAPER="POSIX"
LC_NAME="POSIX"
LC_ADDRESS="POSIX"
LC_TELEPHONE="POSIX"
LC_MEASUREMENT="POSIX"
LC_IDENTIFICATION="POSIX"
LC_ALL=

I can set locale in Dockerfile and also force it within python:

# PYTHONIOENCODING=utf-8 snet organization info ramonduraes

Organization Name:
 - Ramon Durães

Organization Id:
 - ramonduraes

Owner:
 - 0xCEb196e0236C5B4EE62d5C87692284FBd52fCBD0

I'd like to suggest an UnicodeEncodeError handler.

One approach is using stdout.buffer, with it snet will become console's encoding independent.
Example commands.py#L39:

    def _printout(self, message):
        if self.out_f is not None:
                try:
                        print(message, file=self.out_f)
                except UnicodeEncodeError:
                        sys.stdout.buffer.write(message.encode("utf-8"))
@astroseger
Copy link
Collaborator

Looks fine for me... But it should be sys.stdout.buffer.write((message + "\n").encode("utf-8")) or something like this (we need add \n)..

@astroseger
Copy link
Collaborator

was fixed by #184

@ferrouswheel
Copy link
Contributor

I am concerned we are introducing a problem for people. If the console doesn't support UTF-8, we shouldn't force it. I think the correct option is to configure the terminal correctly.

One way to avoid error would be to use message.encode('ascii', 'ignore'), while not forcing utf-8 encoding on a terminal that doesn't support it. This throws away characters that can't be represented.

A better way to avoid error is to use unidecode to replace undisplayable unicode characters with similar ascii characters.

@arturgontijo
Copy link
Contributor Author

I am concerned we are introducing a problem for people. If the console doesn't support UTF-8, we shouldn't force it. I think the correct option is to configure the terminal correctly.

Like pointing this out for users in the README?

One way to avoid error would be to use message.encode('ascii', 'ignore'), while not forcing utf-8 encoding on a terminal that doesn't support it. This throws away characters that can't be represented.

Using "ignore" is fine but I think we also should try "utf-8" before that.
Like happened to me in my Docker container, I've got "utf-8" with the new PR.
With "ignore" it won't break but, in my case, I'll get empty chars while my console does support "utf-8".
What do you think?

A better way to avoid error is to use unidecode to replace undisplayable unicode characters with similar ascii characters.

This should be more elegant but I think we must try to use "utf-8" before that too.

@arturgontijo arturgontijo reopened this Jan 22, 2019
@ferrouswheel
Copy link
Contributor

ferrouswheel commented Jan 22, 2019

@arturgontijo I think we should try to print utf-8 encoding by default, but in the exception handling choose one of:

  • use ignore
  • use unidecode
  • print a message saying they should set their terminal to UTF-8, and then exit
  • on the first exception, print a message saying they should set their terminal to UTF-8, then for future exceptions use ignore/unidecode.

A complete solution would be to inspect the terminal environment to decide the correct encoding, but I'm not sure how easy/complex those settings are to get right.

@astroseger
Copy link
Collaborator

yes. It seems we hurried with the decision. I would vote for the following solution: Intercept "UnicodeEncodeError" and do the following:

  • print a warning that terminal do not support utf-8 (in stderr with # prefix)
  • print message.encode('ascii', 'ignore'))

@astroseger
Copy link
Collaborator

What will happen if we force utf-8 characters on terminal which do not support utf-8? Maybe it was not a bad decision after all? We could force utf-8 but print a warning..

@arturgontijo
Copy link
Contributor Author

I still think that we must "force" the utf-8 too.
A lot of modern python tools rely on that premise (user's console supports it).
If we dive into this the code will have multiple exceptions, example:

Console 1 (does support utf-8 but it is not set)
Console 2 (doesn't support utf-8 at all)

    @staticmethod
    def _print(message, fd):
        message = str(message) + "\n"
        try:
            fd.write(message)
        except UnicodeEncodeError: # <--- Will rise for 1 and 2
            if hasattr(fd, "buffer"):
                fd.buffer.write(message.encode("utf-8")) # <---- Will trow another exception just for 2, right?
            else:
                raise

What will happen if we force utf-8 characters on terminal which do not support utf-8?

I'm kind of lost here, because I can't figure out how to simulate it.

@ferrouswheel
Copy link
Contributor

What will happen if we force utf-8 characters on terminal which do not support utf-8?

I'm kind of lost here, because I can't figure out how to simulate it.

I imagine that you'd get some unexpected characters or junk. But I'm unsure.

I was thinking of something like:

    utf8_warning_printed = False
    @staticmethod
    def _print(message, fd):
        try:
            fd.buffer.write(str(message).encode("utf-8") + "\n")
        except UnicodeEncodeError:
            if not utf8_warning_printed:
                fd.buffer.write("Error attempting to encode unicode! Please ensure your locale supports UTF-8 if you want to see correct character representations.")
                utf8_warning_printed = True
            fd.write(str(message).encode("ascii", "ignore") + "\n")

For "Console 1 (does support utf-8 but it is not set)" - I'm pretty sure we can't detect or fix this if the user has the wrong setting. So I think we should just warn them...

If we start trying to be clever I think we'll make things much harder to debug for ourselves (and for our users trying to configure things!).

@arturgontijo
Copy link
Contributor Author

Ok, I think I've got the idea.

    @staticmethod
    def _print(message, fd):
        message = str(message) + "\n"
        try:
            fd.write(message)
        except UnicodeEncodeError:
            fd.write("Error attempting to encode unicode! Please ensure your locale supports UTF-8 if you want to see correct character representations.")
            message = message.encode("ascii", "ignore")
            fd.write(message.decode())

I'm not using utf8_warning_printed because snet-cli runs and exits at once.

About the warning, maybe we can show:
"Error attempting to encode unicode! Please ensure your locale is set to support UTF-8 if you want to see correct character representations."

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants