Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce the concept of sort keys. #2954

Merged
merged 1 commit into from
Mar 23, 2025
Merged

Conversation

meisterT
Copy link
Member

@meisterT meisterT commented Mar 9, 2025

Progress towards #2525.

The basic idea is to replace a lot of scattered scoring logic (which
might become more complex when we introduce more scoring options such as
optimization problems) by encoding sufficient information within the
existing rank cache.

The information is encoded into a sort key that is designed to be sorted
descendingly. The sort key is a tuple (serialized as string into the
database) of fixed precision decimals. Each tuple uses 9 decimals and is
left padded with 0s, enabling sorting via standard database query
operations. For ascendingly sorted tuple entries (e.g. penalty time),
values are subtracted from a very large constant - this ensures that
that the key can be sorted as a whole.

For example, with ICPC scoring, the top 3 teams from NWERC 2024 receive
these sort keys:

00000000000000000000013.000000000,99999999999999999998725.000000000,99999999999999999999711.000000000
00000000000000000000011.000000000,99999999999999999998918.000000000,99999999999999999999701.000000000
00000000000000000000011.000000000,99999999999999999998916.000000000,99999999999999999999706.000000000

This mechanism should facilitate the implementation of other planned
scoring methods, particularly partial scoring and optimization problems,
with reasonable complexity in the business logic.

We are using bcmath with fixed precision to avoid numerical precision
issues caused by the order of floating point operations. While not
critical currently, this will be essential when handling non-integers,
such as those in optimization problems.

The new mechanism also caches some computations and thus improves
scoreboard computation efficiency.

@meisterT meisterT changed the title Rankcachev2 Prototype for discussion of supporting new scoring formats Mar 9, 2025
@meisterT meisterT force-pushed the rankcachev2 branch 12 times, most recently from bd7357f to eb4cf58 Compare March 15, 2025 10:41
vmcj added a commit to DOMjudge/domjudge-packaging that referenced this pull request Mar 15, 2025
@meisterT meisterT force-pushed the rankcachev2 branch 4 times, most recently from 7d4ce38 to a16f23d Compare March 15, 2025 14:18
github-merge-queue bot pushed a commit to DOMjudge/domjudge-packaging that referenced this pull request Mar 15, 2025
@Kevinjil
Copy link
Contributor

What is the benefit we get from storing a fixed-precision decimal as string instead of using a native decimal type or integer representation of a fixed-precision decimal?

@meisterT
Copy link
Member Author

What is the benefit we get from storing a fixed-precision decimal as string instead of using a native decimal type or integer representation of a fixed-precision decimal?

If the sort key would consist of a simple number, then we could use the native decimal type. But we are using a sort key which is essentially a tuple of fixed precision decimals, allowing to generalize away from ICPC scoring (while making code less scattered and more efficient at the same time). This tuple is of "unknown" length if we want to support more than one scoring method, so we represent it as string.

Side note: bcmath which is the recommended way to work with fixed precision decimal types in PHP, is also using string as underlying representation for individual numbers.

Side note on the side note: separately from this PR, we should look into whether it makes sense using bcmath where we are using decimal(32,9) today, since the mapping from/to float loses precision.

@meisterT meisterT force-pushed the rankcachev2 branch 3 times, most recently from 2699ad8 to 925c0a8 Compare March 16, 2025 18:45
@meisterT meisterT changed the title Prototype for discussion of supporting new scoring formats Introduce the concept of sort keys. Mar 16, 2025
@meisterT meisterT marked this pull request as ready for review March 16, 2025 18:47
use Symfony\Bundle\FrameworkBundle\Test\KernelTestCase;

class ScoreboardServiceTest extends KernelTestCase
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure the tests here are very relevant as they test internal details. I'd focus on the behaviour tests at the end.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They helped me uncover a small bug, so I think they were useful. If at one point they become tedious to update because how it is implemented internally frequently (I don't think so, but who knows) we can always delete them.

@eldering
Copy link
Member

eldering commented Mar 17, 2025

What is the benefit we get from storing a fixed-precision decimal as string instead of using a native decimal type or integer representation of a fixed-precision decimal?

I don't think we always have to use these fixed length decimal strings: the sort keys can be different per different contest/scoring type. So in an ICPC scoring setting I think we can just use zero-padded integers.

If the sort key would consist of a simple number, then we could use the native decimal type. But we are using a sort key which is essentially a tuple of fixed precision decimals, allowing to generalize away from ICPC scoring (while making code less scattered and more efficient at the same time). This tuple is of "unknown" length if we want to support more than one scoring method, so we represent it as string.

Side note: bcmath which is the recommended way to work with fixed precision decimal types in PHP, is also using string as underlying representation for individual numbers.

Side note on the side note: separately from this PR, we should look into whether it makes sense using bcmath where we are using decimal(32,9) today, since the mapping from/to float loses precision.

Repeating my code comment here: an alternative would be to let the sort key really be an opaque blob, even without the assumption that it is sorted (lexicographically), but instead have

  1. a separate function (depending on the scoring type) that maps this opaque blob into something that defines a rank ordering
  2. a compare function to define an order these opaque blobs

Then the opaque blob could e.g. be some JSON encoding the scoring state.

OTOH, in case 1 this function just maps into something similar to what's currently stored in the sorting key already, so nothing is really gained. I guess 2 could be a bit more generic, but I do like the fact that we now declare an ordering simply by comparing these sort keys. That indeed simplifies the code a lot. So probably stick to that unless we need more flexibility.

@meisterT
Copy link
Member Author

What is the benefit we get from storing a fixed-precision decimal as string instead of using a native decimal type or integer representation of a fixed-precision decimal?

I don't think we always have to use these fixed length decimal strings: the sort keys can be different per different contest/scoring type. So in an ICPC scoring setting I think we can just use zero-padded integers.

If you care I am happy to change it, but it requires more code (not much of course) for almost no benefit. Who is going to inspect sort keys?

If the sort key would consist of a simple number, then we could use the native decimal type. But we are using a sort key which is essentially a tuple of fixed precision decimals, allowing to generalize away from ICPC scoring (while making code less scattered and more efficient at the same time). This tuple is of "unknown" length if we want to support more than one scoring method, so we represent it as string.
Side note: bcmath which is the recommended way to work with fixed precision decimal types in PHP, is also using string as underlying representation for individual numbers.
Side note on the side note: separately from this PR, we should look into whether it makes sense using bcmath where we are using decimal(32,9) today, since the mapping from/to float loses precision.

Repeating my code comment here: an alternative would be to let the sort key really be an opaque blob, even without the assumption that it is sorted (lexicographically), but instead have

  1. a separate function (depending on the scoring type) that maps this opaque blob into something that defines a rank ordering
  2. a compare function to define an order these opaque blobs

Then the opaque blob could e.g. be some JSON encoding the scoring state.

OTOH, in case 1 this function just maps into something similar to what's currently stored in the sorting key already, so nothing is really gained. I guess 2 could be a bit more generic, but I do like the fact that we now declare an ordering simply by comparing these sort keys. That indeed simplifies the code a lot. So probably stick to that unless we need more flexibility.

I started out with having a separate function and a JSON blob indeed, but then came up with this idea and it was quite a bit less code. If at one point we are realize that this is not powerful enough, it is trivial to change to what you describe. I do like about the current code that sorting when retrieving the scoreboard is purely a database operation, no business logic required.

But this discussion reminds me that I have to add to the migration a scoreboard refresh so that the data is correct.

@meisterT
Copy link
Member Author

But this discussion reminds me that I have to add to the migration a scoreboard refresh so that the data is correct.

Well, seems like this is not really possible: https://github.com/doctrine/DoctrineMigrationsBundle/pull/559/files

cc @nickygerritsen in case you have ideas


At least we recommend refreshing the cache here: https://www.domjudge.org/docs/manual/8.2/upgrading.html#upgrading

@nickygerritsen
Copy link
Member

But this discussion reminds me that I have to add to the migration a scoreboard refresh so that the data is correct.

Well, seems like this is not really possible: doctrine/DoctrineMigrationsBundle#559 (files)

cc @nickygerritsen in case you have ideas

At least we recommend refreshing the cache here: domjudge.org/docs/manual/8.2/upgrading.html#upgrading

Nope I don't think there is an easy way. What we could do is create a CLI command to refresh the scoreboard cache and then run that on make install or in the upgrade command of the SQL script.

@meisterT
Copy link
Member Author

What we could do is create a CLI command to refresh the scoreboard cache and then run that on make install or in the upgrade command of the SQL script.

Good idea, I will do that (separately from this PR).

@meisterT
Copy link
Member Author

see #2971

Copy link
Member

@nickygerritsen nickygerritsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like it. It looks like it makes things a bit less complex and allows for expansion

Copy link
Member

@vmcj vmcj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did this make any parts faster or slower? How will scoreboards with 1000 teams be affected?

Progress towards DOMjudge#2525.

The basic idea is to replace a lot of scattered scoring logic (which
might become more complex when we introduce more scoring options such as
optimization problems) by encoding sufficient information within the
existing rank cache.

The information is encoded into a sort key that is designed to be sorted
descendingly. The sort key is a tuple (serialized as string into the
database) of fixed precision decimals. Each tuple uses 9 decimals and is
left padded with `0`s, enabling sorting via standard database query
operations. For ascendingly sorted tuple entries (e.g. penalty time),
values are subtracted from a very large constant - this ensures that
that the key can be sorted as a whole.

For example, with ICPC scoring, the top 3 teams from NWERC 2024 receive
these sort keys:
```
00000000000000000000013.000000000,99999999999999999998725.000000000,99999999999999999999711.000000000
00000000000000000000011.000000000,99999999999999999998918.000000000,99999999999999999999701.000000000
00000000000000000000011.000000000,99999999999999999998916.000000000,99999999999999999999706.000000000
```

This mechanism should facilitate the implementation of other planned
scoring methods, particularly partial scoring and optimization problems,
with reasonable complexity in the business logic.

We do use `bcmath` with fixed precision to avoid numerical precision
issues caused by the order of floating point operations.  While not
critical currently, this will be essential when handling non-integers,
such as those in optimization problems.

The new mechanism also caches some computations and thus improves
scoreboard computation efficiency.
@meisterT
Copy link
Member Author

Did this make any parts faster or slower? How will scoreboards with 1000 teams be affected?

It is more or less the same, and other parts dominate the total time, going into some detail below.


Theoretically it does make updates of the cache a tiny bit slower - the good thing here is that this is constant overhead (to construct the sort key) and doesn't scale with the number of teams.

In practice: when refreshing the full NWERC scoreboard from last year it takes in prod mode on my laptop (as median of 11 runs each):

  • on main: ~20.4s
  • this PR: ~20.5s

(But there is a lot of noise, so it is likely not statistically significant.)


Theoretically, it does make retrieving the scoreboard a tiny bit faster as there is less business logic to execute.

In practice, when running

time ( for i in {1..100}; do curl -n -N http://localhost/domjudge/api/contests/nwerc2024/scoreboard > /dev/null; done )

I see:

  • on main: ~18.0s
  • this PR: ~17.9s

(Again there is a lot of noise, so it is likely not statistically significant.)

@eldering
Copy link
Member

Did this make any parts faster or slower? How will scoreboards with 1000 teams be affected?

It is more or less the same, and other parts dominate the total time, going into some detail below.

Theoretically it does make updates of the cache a tiny bit slower - the good thing here is that this is constant overhead (to construct the sort key) and doesn't scale with the number of teams.

I guess you could make it faster by keeping the sort keys smaller. For example for the standard ICPC scoring, we don't need decimals, so the string length can be roughly cut in half.

@meisterT
Copy link
Member Author

I guess you could make it faster by keeping the sort keys smaller. For example for the standard ICPC scoring, we don't need decimals, so the string length can be roughly cut in half.

I tried this, also with ints, and there is no difference in timing.

@meisterT meisterT added this pull request to the merge queue Mar 23, 2025
Merged via the queue into DOMjudge:main with commit 6613b28 Mar 23, 2025
36 checks passed
@meisterT meisterT deleted the rankcachev2 branch March 23, 2025 10:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants