-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
utf8 and utf16 functions that output breaks per code-point #30
Comments
I can create a pull request if this is something you are interested in adding to the API |
I am not sure, but I will definitely look at a PR. |
mbechard
added a commit
to mbechard/libunibreak
that referenced
this issue
Sep 11, 2020
output contained in brks array is per code-point instead of per code-unit implements adah1972#30
mbechard
added a commit
to mbechard/libunibreak
that referenced
this issue
Sep 11, 2020
output contained in brks array is per code-point instead of per code-unit implements adah1972#30
adah1972
pushed a commit
that referenced
this issue
Sep 13, 2020
Output contained in brks could be per code-point instead of per code-unit. Implements #30.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I think useful variants to the utf8 and utf16 functions are versions that output a brks array that is per code-point, instead of per code-unit as the current ones do. That is, the same results that come out of the utf32 version, but allowing for utf8 and utf16 input. In some situations I want to be able to consume the output 'brks' without having to think about what my source encoding was.
Seems simple to add, just an if around the loop that increments posLast and sets LINEBREAK_INSIDEACHAR, and instead just increment posLast once per iteration.
The text was updated successfully, but these errors were encountered: