Skip to content

[intl] Weird numeric sort in Collator #18566

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
arokettu opened this issue May 15, 2025 · 1 comment
Closed

[intl] Weird numeric sort in Collator #18566

arokettu opened this issue May 15, 2025 · 1 comment

Comments

@arokettu
Copy link
Contributor

arokettu commented May 15, 2025

Description

The following code:

<?php

$arr = [
    '2023-02-04 14:00:00',
    '2023-01-08 12:00:00',
    '2023-01-03 12:00:00',
    '2023-01-03 12:00:00',
    '2021-01-03 12:00:00',
    '2023-01-05 14:00:00',
    '2024-01-03 12:00:00',
    '2023-01-03 12:00:00',
];

$coll = Collator::create('de');

$coll->asort($arr, Collator::SORT_REGULAR);

echo json_encode($arr, JSON_PRETTY_PRINT);

Resulted in this output:

{
    "4": "2021-01-03 12:00:00",
    "0": "2023-02-04 14:00:00",
    "1": "2023-01-08 12:00:00",
    "2": "2023-01-03 12:00:00",
    "3": "2023-01-03 12:00:00",
    "5": "2023-01-05 14:00:00",
    "7": "2023-01-03 12:00:00",
    "6": "2024-01-03 12:00:00"
}%

But I expected this output instead:

{
    "4": "2021-01-03 12:00:00",
    "2": "2023-01-03 12:00:00",
    "3": "2023-01-03 12:00:00",
    "7": "2023-01-03 12:00:00",
    "5": "2023-01-05 14:00:00",
    "1": "2023-01-08 12:00:00",
    "0": "2023-02-04 14:00:00",
    "6": "2024-01-03 12:00:00"
}

PHP Version

Since the dawn of time up to 8.4.7 apparently introduced in PHP6(!)

https://3v4l.org/fVcRj

Operating System

any

@arokettu
Copy link
Contributor Author

arokettu commented May 15, 2025

The ultimate cause seems to be collator_convert_string_to_number_if_possible thinking that it's possible to convert to number and making them all (int)2021, (int)2023, (int)2024

if( is_numeric == IS_LONG ) {

uint8_t collator_is_numeric( UChar *str, int32_t length, zend_long *lval, double *dval, bool allow_errors )

@arokettu arokettu changed the title Weird numeric sort in collator [intl] Weird numeric sort in Collator May 15, 2025
@nielsdos nielsdos self-assigned this May 24, 2025
nielsdos added a commit to nielsdos/php-src that referenced this issue May 24, 2025
This aligns the behaviour with normal (non-intl) asort() by making the following changes:
  - Use the same trailing whitespace logic as Zend's is_numeric_ex()
  - Don't allow errors on trailing data

Targeting master because of the BC break.
nielsdos added a commit to nielsdos/php-src that referenced this issue May 24, 2025
This aligns the behaviour with normal (non-intl) asort() by making the following changes:
  - Use the same trailing whitespace logic as Zend's is_numeric_ex()
  - Don't allow errors on trailing data

Targeting master because of the BC break.
@nielsdos nielsdos linked a pull request May 24, 2025 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants