Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Unicode to version 16.0.0 #9141

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

eksperimental
Copy link
Contributor

This is an automated commit created by the Maintenance project https://github.com/eksperimental/maintenance

Before merging, please read the release notes by visiting http://www.unicode.org/versions/Unicode16.0.0/
and assess if additional changes are necessary in the code base.

This is an automated commit created by the Maintenance project
https://github.com/eksperimental/maintenance

Before merging, please read the release notes by visiting
<http://www.unicode.org/versions/Unicode16.0.0/>
and assess if additional changes are necessary in the code base.
Copy link
Contributor

github-actions bot commented Dec 3, 2024

CT Test Results

  1 files   11 suites   5m 31s ⏱️
 93 tests  91 ✅ 2 💤 0 ❌
109 runs  107 ✅ 2 💤 0 ❌

Results for commit fe314d7.

♻️ This comment has been updated with latest results.

To speed up review, make sure that you have read Contributing to Erlang/OTP and that all checks pass.

See the TESTING and DEVELOPMENT HowTo guides for details about how to run test locally.

Artifacts

// Erlang/OTP Github Action Bot

@eksperimental
Copy link
Contributor Author

http://www.unicode.org/versions/Unicode16.0.0/

New Data Files for Unicode 16.0

DoNotEmit.txt. This is a new file that collects information about characters or character sequences that should not be emitted or generated in newly authored text and for which a suitable alternative sequence exists. This data could be used by applications such as input methods or autocorrect.

Unikemet.txt. This data file provides property and other character information in support of Egyptian hieroglyphs.

@eksperimental
Copy link
Contributor Author

/cc @dgud

@dgud dgud self-assigned this Dec 4, 2024
@dgud dgud self-requested a review December 4, 2024 08:35
@dgud dgud added the team:PS Assigned to OTP team PS label Dec 4, 2024
#
# For more information, see UAX #11: East Asian Width,
# at https://www.unicode.org/reports/tr11/
#
# @missing: 0000..10FFFF; N
0000..001F;N # Cc [32] <control-0000>..<control-001F>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NOTE to self, Format change here double check that we parse this correctly.

@@ -24183,6 +24475,4001 @@ FFFD;REPLACEMENT CHARACTER;So;0;ON;;;;;N;;;;;
13453;EGYPTIAN HIEROGLYPH MODIFIER DAMAGED AT TOP AND END;Mn;0;NSM;;;;;N;;;;;
13454;EGYPTIAN HIEROGLYPH MODIFIER DAMAGED AT BOTTOM AND END;Mn;0;NSM;;;;;N;;;;;
13455;EGYPTIAN HIEROGLYPH MODIFIER DAMAGED;Mn;0;NSM;;;;;N;;;;;
13460;EGYPTIAN HIEROGLYPH-13460;Lo;0;L;;;;;N;;;;;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New Large ranges here, check generation of these

@dgud dgud added the testing currently being tested, tag is used by OTP internal CI label Dec 17, 2024
@dgud
Copy link
Contributor

dgud commented Dec 17, 2024

New Data Files for Unicode 16.0
DoNotEmit.txt.
Unikemet.txt.

From what I understand we can ignore those files.

@dgud
Copy link
Contributor

dgud commented Dec 18, 2024

Tests (or specs) are added for Extended grapheme clusters

Some reminders for whenever someone have time to look at this.

%% DEVANAGARI fails must parse IndicSyllabicCategory.txt
%%
%% https://www.unicode.org/reports/tr29/tr29-45.html#Table_Combining_Char_Sequences_and_Grapheme_Clusters
%%
%%
%% https://www.unicode.org/reports/tr44/tr44-34.html#Indic_Conjunct_Break
%% Indic_Conjunct_Break E I This property defines values used in Grapheme Cluster Break algorithm in [UAX29].
%% Generated as follows:

%% Define the set of applicable scripts. For Unicode 15.1, the set is defined as
%% S = [\p{sc=Beng}\p{sc=Deva}\p{sc=Gujr}\p{sc=Mlym}\p{sc=Orya}\p{sc=Telu}]
%% Then for any character C:
%% InCB = Linker iff C in [S &\p{Indic_Syllabic_Category=Virama}]
%% InCB = Consonant iff C in [S &\p{Indic_Syllabic_Category=Consonant}]
%% InCB = Extend iff C in
%% [\p{gcb=Extend}
%% \p{gcb=ZWJ}
%% -\p{InCB=Linker}
%% -\p{InCB=Consonant}
%% -[\u200C]]
%% Otherwise, InCB = None (the default value)
%%
%%

%% # Derived Property: Indic_Conjunct_Break
%% # Generated from the Grapheme_Cluster_Break, Indic_Syllabic_Category,
%% # Canonical_Combining_Class, and Script properties as described in UAX #44:
%% # https://www.unicode.org/reports/tr44/.

%% # All code points not explicitly listed for Indic_Conjunct_Break
%% # have the value None.

@dgud dgud removed the testing currently being tested, tag is used by OTP internal CI label Dec 18, 2024
@IngelaAndin IngelaAndin added the stalled waiting for input by the Erlang/OTP team label Jan 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stalled waiting for input by the Erlang/OTP team team:PS Assigned to OTP team PS
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants