Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Proposal] Use ICU message for i18n & l10n #23863

Open
wxiaoguang opened this issue Apr 1, 2023 · 7 comments
Open

[Proposal] Use ICU message for i18n & l10n #23863

wxiaoguang opened this issue Apr 1, 2023 · 7 comments
Labels
modifies/translation type/feature Completely new functionality. Can only be merged if feature freeze is not active. type/proposal The new feature has not been accepted yet but needs to be discussed first.

Comments

@wxiaoguang
Copy link
Contributor

wxiaoguang commented Apr 1, 2023

To avoid re-inventing wheels, it's better to use ICU message to do i18n/l10n.

Steps:

  1. Fix the buggy ini package
  2. Clean up all translation strings
  3. Introduce ICU message parser
  4. Convert legacy plural-related strings to ICU format
  5. Translate on Crowdin https://support.crowdin.com/icu-message-syntax/

Below is outdated description: the old idea is using a customized message format (it's a simple syntax like ICU message, but it's not supported by Crowdin, so Crowdin can't help to check mistakes).

The official package's design seems clear and will resolve Gitea's i18n/l10n problems fundamentally.

https://pkg.go.dev/golang.org/x/text/message

https://pkg.go.dev/golang.org/x/text/feature/plural

https://github.com/unicode-org/cldr/blob/main/common/supplemental/ordinals.xml

https://github.com/unicode-org/cldr/blob/main/common/supplemental/plurals.xml

I think a translator-friendly syntax is very important, because there are really a lot of broken translations, if we make the system more complex, there will be more errors.

And the syntax should be also designed for frontend (JS/Vue).

As the first step, we should refactor the locale package to make it stable, see the problems

A brief idea about how to maintain the translation strings:

<!-- 1: other -->  {%d $[text]}

<!-- 2: one,other --> {%d $[text,texts]}

<!-- 3: zero,one,other --> {%d $zero[0,1,o]}
<!-- 3: one,two,other --> {%d $two[1,2,o]}
<!-- 3: one,few,other --> {%d $few[1,f,o]}
<!-- 3: one,many,other --> {%d $many[1,m,o]}

<!-- 4: one,two,few,other --> {%d $two-few[1,2,f,o]}
<!-- 4: one,two,many,other --> {%d $two-many[1,2,m,o]}
<!-- 4: one,few,many,other --> {%d $few-many[1,f,m,o]}

<!-- 5: one,two,few,many,other --> {%d $[1,2,f,m,o]}

<!-- 6: zero,one,two,few,many,other --> {%d $[0,1,2,f,m,o]}

Then use the syntax to support different languages:

en: msg = there are {%d $[pull request, pull requests]}
lv: msg = there are {%d $zero[for 0 pull request, pull request, pull requests]}
ar: msg = there are {%d $[for 0, for 1, for 2, few, many, other]}

Another possible approach, define all concepts ahead:

en: NumPR = {%d $[pull request, pull requests]}
lv: NumPR = {%d $zero[for 0 pull request, pull request, pull requests]}
ar: NumPR = {%d $[for 0, for 1, for 2, few, many, other]}

Then the NumPR could be reused:

en: msg = there are {$NumPR}
lv: msg = there are {$NumPR}
ar: msg = there are {$NumPR}

If we only need to support one %d, the syntax might be simplified, eg:

en: msg = there are %d $[pull request, pull requests]
lv: msg = there are %d $zero[for 0 pull request, pull request, pull requests]
ar: msg = there are %d $[for 0, for 1, for 2, few, many, other]
@wxiaoguang wxiaoguang added type/feature Completely new functionality. Can only be merged if feature freeze is not active. type/proposal The new feature has not been accepted yet but needs to be discussed first. labels Apr 1, 2023
@wxiaoguang wxiaoguang changed the title [Proposal] Use golang's x/text package for i18n & l10n [Proposal] Use ICU message for i18n & l10n Apr 6, 2023
@lunny
Copy link
Member

lunny commented Apr 28, 2023

Are there any tool to convert ini format to that ICU format? Or should we create one?

@wxiaoguang
Copy link
Contributor Author

I didn't get your mean.

ICU is a just message format, no need to convert

@lunny
Copy link
Member

lunny commented Apr 28, 2023

Maybe we should use another format but ini files?

@wxiaoguang
Copy link
Contributor Author

Why?

@silverwind
Copy link
Member

YAML may be ok as it requires less escaping than INI. But one also needs to be aware of it's pitfalls, like no becoming boolean false because it is a typed language which ini isn't.

@wxiaoguang
Copy link
Contributor Author

wxiaoguang commented Apr 28, 2023

At the moment I don't see real benefit that YAML would bring.

Actually we do not need too much "escaping" with INI, there are just some legacy bugs.

The only "escaping" requirements are:

  1. The comment , YAML still needs to escape / quote #
  2. The leading/trailing space: YAML still needs to quote it by "
  3. Multiple-line support: YAML's syntax is not as simple as INI

I think INI still wins.

@silverwind
Copy link
Member

Found another use case where {placeholder} syntax would have been really useful:

https://github.com/go-gitea/gitea/pull/25050/files#r1214691116

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
modifies/translation type/feature Completely new functionality. Can only be merged if feature freeze is not active. type/proposal The new feature has not been accepted yet but needs to be discussed first.
Projects
None yet
Development

No branches or pull requests

4 participants