utf8 and utf16 functions that output breaks per code-point #30

mbechard · 2020-09-08T03:00:23Z

I think useful variants to the utf8 and utf16 functions are versions that output a brks array that is per code-point, instead of per code-unit as the current ones do. That is, the same results that come out of the utf32 version, but allowing for utf8 and utf16 input. In some situations I want to be able to consume the output 'brks' without having to think about what my source encoding was.
Seems simple to add, just an if around the loop that increments posLast and sets LINEBREAK_INSIDEACHAR, and instead just increment posLast once per iteration.

mbechard · 2020-09-08T03:06:16Z

I can create a pull request if this is something you are interested in adding to the API

adah1972 · 2020-09-08T16:08:53Z

I am not sure, but I will definitely look at a PR.

output contained in brks array is per code-point instead of per code-unit implements adah1972#30

Output contained in brks could be per code-point instead of per code-unit. Implements #30.

mbechard mentioned this issue Sep 8, 2020

add per_code_point variants of utf8 and utf16 functions #31

Merged

mbechard added a commit to mbechard/libunibreak that referenced this issue Sep 11, 2020

add per_code_point variants of utf8 and utf16 functions

1ebed6e

output contained in brks array is per code-point instead of per code-unit implements adah1972#30

mbechard added a commit to mbechard/libunibreak that referenced this issue Sep 11, 2020

add per_code_point variants of utf8 and utf16 functions

45d151d

output contained in brks array is per code-point instead of per code-unit implements adah1972#30

adah1972 pushed a commit that referenced this issue Sep 13, 2020

Add per_code_point variants of utf8 and utf16 functions

a6bcee2

Output contained in brks could be per code-point instead of per code-unit. Implements #30.

mbechard closed this as completed Sep 13, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

utf8 and utf16 functions that output breaks per code-point #30

utf8 and utf16 functions that output breaks per code-point #30

mbechard commented Sep 8, 2020

mbechard commented Sep 8, 2020

adah1972 commented Sep 8, 2020

utf8 and utf16 functions that output breaks per code-point #30

utf8 and utf16 functions that output breaks per code-point #30

Comments

mbechard commented Sep 8, 2020

mbechard commented Sep 8, 2020

adah1972 commented Sep 8, 2020