Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for wide characters and emojis #103

Open
ony opened this issue Nov 25, 2019 · 5 comments
Open

Support for wide characters and emojis #103

ony opened this issue Nov 25, 2019 · 5 comments

Comments

@ony
Copy link

ony commented Nov 25, 2019

Currently Data.Text.length is used to identify length of the text in characters. But this is not true for Unicode wide characters and emojis that can occupy 2 cells on the terminal. As well some of the characters might be invisible or lead to vertical tab etc.

Most likely it should be on the rendering side (e.g. HTML may use different ways to align text than spaces) and in this case it worth considering indent/column just a guideline based on other characters in layout. I.e.

|W|i|d|e|W|
|0 |1 |2 |3 |4|5 - virtual columns during layout
|0 |2 |4 |6 |8|9 - actual terminal columns during render

(dependening on font may be rendered unaligned in browser, but should be fine in terminal)

It should mostly work fine for right alignment as long as people don't use characters to pad, but use proper indent. Though last character might not always be tightly attached to right side.

More examples can be found in simonmichael/hledger#895

@sjakobi
Copy link
Collaborator

sjakobi commented Jan 11, 2020

This sounds like a nice feature!

I believe this would be easier to implement in the layouter than the renderer. The layouter is already somewhat output-environment-aware, via PageWidth:

-- | Options to influence the layout algorithms.
newtype LayoutOptions = LayoutOptions { layoutPageWidth :: PageWidth }

Otherwise I believe we'd need to make some big changes to SimpleDocStream.

As a first step, we'd need a way to get the correct character widths – is there a nice, well-maintained Haskell library for that? How does hledger address the problem?

@sjakobi
Copy link
Collaborator

sjakobi commented Jan 11, 2020

As a first step, we'd need a way to get the correct character widths – is there a nice, well-maintained Haskell library for that?

I see that tasty relies on wcwidth: https://github.com/feuerbach/tasty/blob/072ecb1cd4f6755f3b974b1c00a36fbd66266181/core/Test/Tasty/Ingredients/ConsoleReporter.hs#L598-L613

@ony
Copy link
Author

ony commented Jan 13, 2020

As a first step, we'd need a way to get the correct character widths – is there a nice, well-maintained Haskell library for that?

You can see a bit of summary about usages in this comment.

I see that tasty relies on wcwidth: https://github.com/feuerbach/tasty/blob/072ecb1cd4f6755f3b974b1c00a36fbd66266181/core/Test/Tasty/Ingredients/ConsoleReporter.hs#L598-L613

If it would be that simple it would be nice. Unfortunately that lwcwidth library is not actively maintained at the moment. See solidsnack/wcwidth#2 .

@sjakobi
Copy link
Collaborator

sjakobi commented Mar 13, 2021

I noticed that doclayout includes some logic for wide characters via its realLength function: http://hackage.haskell.org/package/doclayout-0.3/docs/src/Text.DocLayout.html#realLength

I suspect it's too simple for your needs, @ony, but it might be worth investigating how much it would take to make it good enough.

@quchen
Copy link
Owner

quchen commented Jun 8, 2021

A simple fix would be adding an unsafeTextWithLength :: Text -> Int -> Doc ann function, where the text length can be specified by the programmer. I don’t expect emojis etc. are the main use case for this library. Determining char width is something nontrivial (and font-dependent!) that’s out of scope here, but we could offer a keyhole to plug other utilities in that offer something like charWidth :: Char -> Font -> Float.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants