Update Unicode to version 16.0.0 #9141

eksperimental · 2024-12-03T21:16:51Z

This is an automated commit created by the Maintenance project https://github.com/eksperimental/maintenance

Before merging, please read the release notes by visiting http://www.unicode.org/versions/Unicode16.0.0/
and assess if additional changes are necessary in the code base.

This is an automated commit created by the Maintenance project https://github.com/eksperimental/maintenance Before merging, please read the release notes by visiting <http://www.unicode.org/versions/Unicode16.0.0/> and assess if additional changes are necessary in the code base.

github-actions · 2024-12-03T21:17:36Z

CT Test Results

1 files 11 suites 5m 31s ⏱️
93 tests 91 ✅ 2 💤 0 ❌
109 runs 107 ✅ 2 💤 0 ❌

Results for commit fe314d7.

♻️ This comment has been updated with latest results.

To speed up review, make sure that you have read Contributing to Erlang/OTP and that all checks pass.

See the TESTING and DEVELOPMENT HowTo guides for details about how to run test locally.

Artifacts

// Erlang/OTP Github Action Bot

eksperimental · 2024-12-03T21:26:06Z

http://www.unicode.org/versions/Unicode16.0.0/

New Data Files for Unicode 16.0

DoNotEmit.txt. This is a new file that collects information about characters or character sequences that should not be emitted or generated in newly authored text and for which a suitable alternative sequence exists. This data could be used by applications such as input methods or autocorrect.

Unikemet.txt. This data file provides property and other character information in support of Egyptian hieroglyphs.

eksperimental · 2024-12-03T21:46:24Z

/cc @dgud

dgud · 2024-12-04T08:42:16Z

lib/stdlib/uc_spec/EastAsianWidth.txt

 #
 # For more information, see UAX #11: East Asian Width,
 # at https://www.unicode.org/reports/tr11/
 #
 # @missing: 0000..10FFFF; N
-0000..001F;N     # Cc    [32] <control-0000>..<control-001F>


NOTE to self, Format change here double check that we parse this correctly.

dgud · 2024-12-04T08:49:40Z

lib/stdlib/uc_spec/UnicodeData.txt

@@ -24183,6 +24475,4001 @@ FFFD;REPLACEMENT CHARACTER;So;0;ON;;;;;N;;;;;
 13453;EGYPTIAN HIEROGLYPH MODIFIER DAMAGED AT TOP AND END;Mn;0;NSM;;;;;N;;;;;
 13454;EGYPTIAN HIEROGLYPH MODIFIER DAMAGED AT BOTTOM AND END;Mn;0;NSM;;;;;N;;;;;
 13455;EGYPTIAN HIEROGLYPH MODIFIER DAMAGED;Mn;0;NSM;;;;;N;;;;;
+13460;EGYPTIAN HIEROGLYPH-13460;Lo;0;L;;;;;N;;;;;


New Large ranges here, check generation of these

dgud · 2024-12-17T12:35:34Z

New Data Files for Unicode 16.0
DoNotEmit.txt.
Unikemet.txt.

From what I understand we can ignore those files.

dgud · 2024-12-18T19:57:06Z

Tests (or specs) are added for Extended grapheme clusters

Some reminders for whenever someone have time to look at this.

%% DEVANAGARI fails must parse IndicSyllabicCategory.txt
%%
%% https://www.unicode.org/reports/tr29/tr29-45.html#Table_Combining_Char_Sequences_and_Grapheme_Clusters
%%
%%
%% https://www.unicode.org/reports/tr44/tr44-34.html#Indic_Conjunct_Break
%% Indic_Conjunct_Break E I This property defines values used in Grapheme Cluster Break algorithm in [UAX29].
%% Generated as follows:

%% Define the set of applicable scripts. For Unicode 15.1, the set is defined as
%% S = [\p{sc=Beng}\p{sc=Deva}\p{sc=Gujr}\p{sc=Mlym}\p{sc=Orya}\p{sc=Telu}]
%% Then for any character C:
%% InCB = Linker iff C in [S &\p{Indic_Syllabic_Category=Virama}]
%% InCB = Consonant iff C in [S &\p{Indic_Syllabic_Category=Consonant}]
%% InCB = Extend iff C in
%% [\p{gcb=Extend}
%% \p{gcb=ZWJ}
%% -\p{InCB=Linker}
%% -\p{InCB=Consonant}
%% -[\u200C]]
%% Otherwise, InCB = None (the default value)
%%
%%

%% # Derived Property: Indic_Conjunct_Break
%% # Generated from the Grapheme_Cluster_Break, Indic_Syllabic_Category,
%% # Canonical_Combining_Class, and Script properties as described in UAX #44:
%% # https://www.unicode.org/reports/tr44/.

%% # All code points not explicitly listed for Indic_Conjunct_Break
%% # have the value None.

dgud self-assigned this Dec 4, 2024

dgud self-requested a review December 4, 2024 08:35

dgud added the team:PS Assigned to OTP team PS label Dec 4, 2024

dgud approved these changes Dec 4, 2024

View reviewed changes

dgud added the testing currently being tested, tag is used by OTP internal CI label Dec 17, 2024

dgud removed the testing currently being tested, tag is used by OTP internal CI label Dec 18, 2024

IngelaAndin added the stalled waiting for input by the Erlang/OTP team label Jan 14, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update Unicode to version 16.0.0 #9141

Update Unicode to version 16.0.0 #9141

eksperimental commented Dec 3, 2024

github-actions bot commented Dec 3, 2024 •

edited

Loading

eksperimental commented Dec 3, 2024

eksperimental commented Dec 3, 2024

dgud Dec 4, 2024

dgud Dec 4, 2024

dgud commented Dec 17, 2024

dgud commented Dec 18, 2024

Update Unicode to version 16.0.0 #9141

Are you sure you want to change the base?

Update Unicode to version 16.0.0 #9141

Conversation

eksperimental commented Dec 3, 2024

github-actions bot commented Dec 3, 2024 • edited Loading

CT Test Results

Artifacts

eksperimental commented Dec 3, 2024

eksperimental commented Dec 3, 2024

dgud Dec 4, 2024

Choose a reason for hiding this comment

dgud Dec 4, 2024

Choose a reason for hiding this comment

dgud commented Dec 17, 2024

dgud commented Dec 18, 2024

github-actions bot commented Dec 3, 2024 •

edited

Loading