Introduce the concept of sort keys. #2954

meisterT · 2025-03-09T19:28:47Z

Progress towards #2525.

The basic idea is to replace a lot of scattered scoring logic (which
might become more complex when we introduce more scoring options such as
optimization problems) by encoding sufficient information within the
existing rank cache.

The information is encoded into a sort key that is designed to be sorted
descendingly. The sort key is a tuple (serialized as string into the
database) of fixed precision decimals. Each tuple uses 9 decimals and is
left padded with 0s, enabling sorting via standard database query
operations. For ascendingly sorted tuple entries (e.g. penalty time),
values are subtracted from a very large constant - this ensures that
that the key can be sorted as a whole.

For example, with ICPC scoring, the top 3 teams from NWERC 2024 receive
these sort keys:

00000000000000000000013.000000000,99999999999999999998725.000000000,99999999999999999999711.000000000
00000000000000000000011.000000000,99999999999999999998918.000000000,99999999999999999999701.000000000
00000000000000000000011.000000000,99999999999999999998916.000000000,99999999999999999999706.000000000

This mechanism should facilitate the implementation of other planned
scoring methods, particularly partial scoring and optimization problems,
with reasonable complexity in the business logic.

We are using bcmath with fixed precision to avoid numerical precision
issues caused by the order of floating point operations. While not
critical currently, this will be essential when handling non-integers,
such as those in optimization problems.

The new mechanism also caches some computations and thus improves
scoreboard computation efficiency.

webapp/src/Utils/Scoreboard/Scoreboard.php

Kevinjil · 2025-03-16T10:11:08Z

What is the benefit we get from storing a fixed-precision decimal as string instead of using a native decimal type or integer representation of a fixed-precision decimal?

meisterT · 2025-03-16T10:19:40Z

What is the benefit we get from storing a fixed-precision decimal as string instead of using a native decimal type or integer representation of a fixed-precision decimal?

If the sort key would consist of a simple number, then we could use the native decimal type. But we are using a sort key which is essentially a tuple of fixed precision decimals, allowing to generalize away from ICPC scoring (while making code less scattered and more efficient at the same time). This tuple is of "unknown" length if we want to support more than one scoring method, so we represent it as string.

Side note: bcmath which is the recommended way to work with fixed precision decimal types in PHP, is also using string as underlying representation for individual numbers.

Side note on the side note: separately from this PR, we should look into whether it makes sense using bcmath where we are using decimal(32,9) today, since the mapping from/to float loses precision.

eldering · 2025-03-17T22:42:49Z

webapp/tests/Unit/Service/ScoreboardServiceTest.php

+use Symfony\Bundle\FrameworkBundle\Test\KernelTestCase;
+
+class ScoreboardServiceTest extends KernelTestCase
+{


Not sure the tests here are very relevant as they test internal details. I'd focus on the behaviour tests at the end.

They helped me uncover a small bug, so I think they were useful. If at one point they become tedious to update because how it is implemented internally frequently (I don't think so, but who knows) we can always delete them.

webapp/migrations/Version20250309122806.php

eldering · 2025-03-17T22:59:37Z

What is the benefit we get from storing a fixed-precision decimal as string instead of using a native decimal type or integer representation of a fixed-precision decimal?

I don't think we always have to use these fixed length decimal strings: the sort keys can be different per different contest/scoring type. So in an ICPC scoring setting I think we can just use zero-padded integers.

If the sort key would consist of a simple number, then we could use the native decimal type. But we are using a sort key which is essentially a tuple of fixed precision decimals, allowing to generalize away from ICPC scoring (while making code less scattered and more efficient at the same time). This tuple is of "unknown" length if we want to support more than one scoring method, so we represent it as string.

Side note: bcmath which is the recommended way to work with fixed precision decimal types in PHP, is also using string as underlying representation for individual numbers.

Side note on the side note: separately from this PR, we should look into whether it makes sense using bcmath where we are using decimal(32,9) today, since the mapping from/to float loses precision.

Repeating my code comment here: an alternative would be to let the sort key really be an opaque blob, even without the assumption that it is sorted (lexicographically), but instead have

a separate function (depending on the scoring type) that maps this opaque blob into something that defines a rank ordering
a compare function to define an order these opaque blobs

Then the opaque blob could e.g. be some JSON encoding the scoring state.

OTOH, in case 1 this function just maps into something similar to what's currently stored in the sorting key already, so nothing is really gained. I guess 2 could be a bit more generic, but I do like the fact that we now declare an ordering simply by comparing these sort keys. That indeed simplifies the code a lot. So probably stick to that unless we need more flexibility.

meisterT · 2025-03-18T06:46:27Z

What is the benefit we get from storing a fixed-precision decimal as string instead of using a native decimal type or integer representation of a fixed-precision decimal?

I don't think we always have to use these fixed length decimal strings: the sort keys can be different per different contest/scoring type. So in an ICPC scoring setting I think we can just use zero-padded integers.

If you care I am happy to change it, but it requires more code (not much of course) for almost no benefit. Who is going to inspect sort keys?

If the sort key would consist of a simple number, then we could use the native decimal type. But we are using a sort key which is essentially a tuple of fixed precision decimals, allowing to generalize away from ICPC scoring (while making code less scattered and more efficient at the same time). This tuple is of "unknown" length if we want to support more than one scoring method, so we represent it as string.
Side note: bcmath which is the recommended way to work with fixed precision decimal types in PHP, is also using string as underlying representation for individual numbers.
Side note on the side note: separately from this PR, we should look into whether it makes sense using bcmath where we are using decimal(32,9) today, since the mapping from/to float loses precision.

Repeating my code comment here: an alternative would be to let the sort key really be an opaque blob, even without the assumption that it is sorted (lexicographically), but instead have

a separate function (depending on the scoring type) that maps this opaque blob into something that defines a rank ordering

a compare function to define an order these opaque blobs

Then the opaque blob could e.g. be some JSON encoding the scoring state.

OTOH, in case 1 this function just maps into something similar to what's currently stored in the sorting key already, so nothing is really gained. I guess 2 could be a bit more generic, but I do like the fact that we now declare an ordering simply by comparing these sort keys. That indeed simplifies the code a lot. So probably stick to that unless we need more flexibility.

I started out with having a separate function and a JSON blob indeed, but then came up with this idea and it was quite a bit less code. If at one point we are realize that this is not powerful enough, it is trivial to change to what you describe. I do like about the current code that sorting when retrieving the scoreboard is purely a database operation, no business logic required.

But this discussion reminds me that I have to add to the migration a scoreboard refresh so that the data is correct.

meisterT · 2025-03-21T21:04:10Z

But this discussion reminds me that I have to add to the migration a scoreboard refresh so that the data is correct.

Well, seems like this is not really possible: https://github.com/doctrine/DoctrineMigrationsBundle/pull/559/files

cc @nickygerritsen in case you have ideas

At least we recommend refreshing the cache here: https://www.domjudge.org/docs/manual/8.2/upgrading.html#upgrading

nickygerritsen · 2025-03-21T21:06:48Z

But this discussion reminds me that I have to add to the migration a scoreboard refresh so that the data is correct.

Well, seems like this is not really possible: doctrine/DoctrineMigrationsBundle#559 (files)

cc @nickygerritsen in case you have ideas

At least we recommend refreshing the cache here: domjudge.org/docs/manual/8.2/upgrading.html#upgrading

Nope I don't think there is an easy way. What we could do is create a CLI command to refresh the scoreboard cache and then run that on make install or in the upgrade command of the SQL script.

meisterT · 2025-03-21T21:08:45Z

What we could do is create a CLI command to refresh the scoreboard cache and then run that on make install or in the upgrade command of the SQL script.

Good idea, I will do that (separately from this PR).

meisterT · 2025-03-21T21:22:55Z

see #2971

nickygerritsen

I like it. It looks like it makes things a bit less complex and allows for expansion

webapp/src/Service/ScoreboardService.php

vmcj

Did this make any parts faster or slower? How will scoreboards with 1000 teams be affected?

webapp/src/Service/ScoreboardService.php

webapp/tests/Unit/Service/ScoreboardServiceTest.php

Progress towards DOMjudge#2525. The basic idea is to replace a lot of scattered scoring logic (which might become more complex when we introduce more scoring options such as optimization problems) by encoding sufficient information within the existing rank cache. The information is encoded into a sort key that is designed to be sorted descendingly. The sort key is a tuple (serialized as string into the database) of fixed precision decimals. Each tuple uses 9 decimals and is left padded with `0`s, enabling sorting via standard database query operations. For ascendingly sorted tuple entries (e.g. penalty time), values are subtracted from a very large constant - this ensures that that the key can be sorted as a whole. For example, with ICPC scoring, the top 3 teams from NWERC 2024 receive these sort keys: ``` 00000000000000000000013.000000000,99999999999999999998725.000000000,99999999999999999999711.000000000 00000000000000000000011.000000000,99999999999999999998918.000000000,99999999999999999999701.000000000 00000000000000000000011.000000000,99999999999999999998916.000000000,99999999999999999999706.000000000 ``` This mechanism should facilitate the implementation of other planned scoring methods, particularly partial scoring and optimization problems, with reasonable complexity in the business logic. We do use `bcmath` with fixed precision to avoid numerical precision issues caused by the order of floating point operations. While not critical currently, this will be essential when handling non-integers, such as those in optimization problems. The new mechanism also caches some computations and thus improves scoreboard computation efficiency.

meisterT · 2025-03-22T11:29:18Z

Did this make any parts faster or slower? How will scoreboards with 1000 teams be affected?

It is more or less the same, and other parts dominate the total time, going into some detail below.

Theoretically it does make updates of the cache a tiny bit slower - the good thing here is that this is constant overhead (to construct the sort key) and doesn't scale with the number of teams.

In practice: when refreshing the full NWERC scoreboard from last year it takes in prod mode on my laptop (as median of 11 runs each):

on main: ~20.4s
this PR: ~20.5s

(But there is a lot of noise, so it is likely not statistically significant.)

Theoretically, it does make retrieving the scoreboard a tiny bit faster as there is less business logic to execute.

In practice, when running

time ( for i in {1..100}; do curl -n -N http://localhost/domjudge/api/contests/nwerc2024/scoreboard > /dev/null; done )

I see:

on main: ~18.0s
this PR: ~17.9s

(Again there is a lot of noise, so it is likely not statistically significant.)

eldering · 2025-03-22T11:37:13Z

Did this make any parts faster or slower? How will scoreboards with 1000 teams be affected?

It is more or less the same, and other parts dominate the total time, going into some detail below.

Theoretically it does make updates of the cache a tiny bit slower - the good thing here is that this is constant overhead (to construct the sort key) and doesn't scale with the number of teams.

I guess you could make it faster by keeping the sort keys smaller. For example for the standard ICPC scoring, we don't need decimals, so the string length can be roughly cut in half.

meisterT · 2025-03-22T12:11:21Z

I guess you could make it faster by keeping the sort keys smaller. For example for the standard ICPC scoring, we don't need decimals, so the string length can be roughly cut in half.

I tried this, also with ints, and there is no difference in timing.

meisterT changed the title ~~Rankcachev2~~ Prototype for discussion of supporting new scoring formats Mar 9, 2025

meisterT force-pushed the rankcachev2 branch 12 times, most recently from bd7357f to eb4cf58 Compare March 15, 2025 10:41

vmcj added a commit to DOMjudge/domjudge-packaging that referenced this pull request Mar 15, 2025

Add bcmath for DOMjudge/domjudge#2954

25c3d1d

meisterT force-pushed the rankcachev2 branch from eb4cf58 to a468b77 Compare March 15, 2025 11:49

vmcj reviewed Mar 15, 2025

View reviewed changes

webapp/src/Utils/Scoreboard/Scoreboard.php Outdated Show resolved Hide resolved

meisterT force-pushed the rankcachev2 branch 4 times, most recently from 7d4ce38 to a16f23d Compare March 15, 2025 14:18

github-merge-queue bot pushed a commit to DOMjudge/domjudge-packaging that referenced this pull request Mar 15, 2025

Add bcmath for DOMjudge/domjudge#2954

10dfd94

meisterT force-pushed the rankcachev2 branch from a16f23d to b86806a Compare March 15, 2025 17:15

meisterT force-pushed the rankcachev2 branch 3 times, most recently from 2699ad8 to 925c0a8 Compare March 16, 2025 18:45

meisterT changed the title ~~Prototype for discussion of supporting new scoring formats~~ Introduce the concept of sort keys. Mar 16, 2025

meisterT force-pushed the rankcachev2 branch from 925c0a8 to 83d6aa3 Compare March 16, 2025 18:47

meisterT marked this pull request as ready for review March 16, 2025 18:47

eldering reviewed Mar 17, 2025

View reviewed changes

nickygerritsen approved these changes Mar 22, 2025

View reviewed changes

webapp/src/Service/ScoreboardService.php Outdated Show resolved Hide resolved

vmcj approved these changes Mar 22, 2025

View reviewed changes

webapp/src/Service/ScoreboardService.php Show resolved Hide resolved

webapp/tests/Unit/Service/ScoreboardServiceTest.php Show resolved Hide resolved

meisterT force-pushed the rankcachev2 branch from 7a5fca2 to 3f1274d Compare March 22, 2025 10:52

meisterT force-pushed the rankcachev2 branch from 3f1274d to 6fb4d53 Compare March 22, 2025 10:56

meisterT added this pull request to the merge queue Mar 23, 2025

Merged via the queue into DOMjudge:main with commit 6613b28 Mar 23, 2025
36 checks passed

meisterT deleted the rankcachev2 branch March 23, 2025 10:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce the concept of sort keys. #2954

Introduce the concept of sort keys. #2954

meisterT commented Mar 9, 2025 •

edited

Loading

Kevinjil commented Mar 16, 2025

meisterT commented Mar 16, 2025

eldering Mar 17, 2025

meisterT Mar 18, 2025

eldering commented Mar 17, 2025 •

edited

Loading

meisterT commented Mar 18, 2025

meisterT commented Mar 21, 2025

nickygerritsen commented Mar 21, 2025

meisterT commented Mar 21, 2025

meisterT commented Mar 21, 2025

nickygerritsen left a comment

vmcj left a comment

meisterT commented Mar 22, 2025

eldering commented Mar 22, 2025

meisterT commented Mar 22, 2025

Introduce the concept of sort keys. #2954

Introduce the concept of sort keys. #2954

Conversation

meisterT commented Mar 9, 2025 • edited Loading

Kevinjil commented Mar 16, 2025

meisterT commented Mar 16, 2025

eldering Mar 17, 2025

Choose a reason for hiding this comment

meisterT Mar 18, 2025

Choose a reason for hiding this comment

eldering commented Mar 17, 2025 • edited Loading

meisterT commented Mar 18, 2025

meisterT commented Mar 21, 2025

nickygerritsen commented Mar 21, 2025

meisterT commented Mar 21, 2025

meisterT commented Mar 21, 2025

nickygerritsen left a comment

Choose a reason for hiding this comment

vmcj left a comment

Choose a reason for hiding this comment

meisterT commented Mar 22, 2025

eldering commented Mar 22, 2025

meisterT commented Mar 22, 2025

meisterT commented Mar 9, 2025 •

edited

Loading

eldering commented Mar 17, 2025 •

edited

Loading