-
Notifications
You must be signed in to change notification settings - Fork 265
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce the concept of sort keys. #2954
Conversation
bd7357f
to
eb4cf58
Compare
7d4ce38
to
a16f23d
Compare
What is the benefit we get from storing a fixed-precision decimal as string instead of using a native |
If the sort key would consist of a simple number, then we could use the native Side note: Side note on the side note: separately from this PR, we should look into whether it makes sense using |
2699ad8
to
925c0a8
Compare
use Symfony\Bundle\FrameworkBundle\Test\KernelTestCase; | ||
|
||
class ScoreboardServiceTest extends KernelTestCase | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure the tests here are very relevant as they test internal details. I'd focus on the behaviour tests at the end.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They helped me uncover a small bug, so I think they were useful. If at one point they become tedious to update because how it is implemented internally frequently (I don't think so, but who knows) we can always delete them.
I don't think we always have to use these fixed length decimal strings: the sort keys can be different per different contest/scoring type. So in an ICPC scoring setting I think we can just use zero-padded integers.
Repeating my code comment here: an alternative would be to let the sort key really be an opaque blob, even without the assumption that it is sorted (lexicographically), but instead have
Then the opaque blob could e.g. be some JSON encoding the scoring state. OTOH, in case 1 this function just maps into something similar to what's currently stored in the sorting key already, so nothing is really gained. I guess 2 could be a bit more generic, but I do like the fact that we now declare an ordering simply by comparing these sort keys. That indeed simplifies the code a lot. So probably stick to that unless we need more flexibility. |
If you care I am happy to change it, but it requires more code (not much of course) for almost no benefit. Who is going to inspect sort keys?
I started out with having a separate function and a JSON blob indeed, but then came up with this idea and it was quite a bit less code. If at one point we are realize that this is not powerful enough, it is trivial to change to what you describe. I do like about the current code that sorting when retrieving the scoreboard is purely a database operation, no business logic required. But this discussion reminds me that I have to add to the migration a scoreboard refresh so that the data is correct. |
Well, seems like this is not really possible: https://github.com/doctrine/DoctrineMigrationsBundle/pull/559/files cc @nickygerritsen in case you have ideas At least we recommend refreshing the cache here: https://www.domjudge.org/docs/manual/8.2/upgrading.html#upgrading |
Nope I don't think there is an easy way. What we could do is create a CLI command to refresh the scoreboard cache and then run that on |
Good idea, I will do that (separately from this PR). |
see #2971 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like it. It looks like it makes things a bit less complex and allows for expansion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did this make any parts faster or slower? How will scoreboards with 1000 teams be affected?
Progress towards DOMjudge#2525. The basic idea is to replace a lot of scattered scoring logic (which might become more complex when we introduce more scoring options such as optimization problems) by encoding sufficient information within the existing rank cache. The information is encoded into a sort key that is designed to be sorted descendingly. The sort key is a tuple (serialized as string into the database) of fixed precision decimals. Each tuple uses 9 decimals and is left padded with `0`s, enabling sorting via standard database query operations. For ascendingly sorted tuple entries (e.g. penalty time), values are subtracted from a very large constant - this ensures that that the key can be sorted as a whole. For example, with ICPC scoring, the top 3 teams from NWERC 2024 receive these sort keys: ``` 00000000000000000000013.000000000,99999999999999999998725.000000000,99999999999999999999711.000000000 00000000000000000000011.000000000,99999999999999999998918.000000000,99999999999999999999701.000000000 00000000000000000000011.000000000,99999999999999999998916.000000000,99999999999999999999706.000000000 ``` This mechanism should facilitate the implementation of other planned scoring methods, particularly partial scoring and optimization problems, with reasonable complexity in the business logic. We do use `bcmath` with fixed precision to avoid numerical precision issues caused by the order of floating point operations. While not critical currently, this will be essential when handling non-integers, such as those in optimization problems. The new mechanism also caches some computations and thus improves scoreboard computation efficiency.
It is more or less the same, and other parts dominate the total time, going into some detail below. Theoretically it does make updates of the cache a tiny bit slower - the good thing here is that this is constant overhead (to construct the sort key) and doesn't scale with the number of teams. In practice: when refreshing the full NWERC scoreboard from last year it takes in prod mode on my laptop (as median of 11 runs each):
(But there is a lot of noise, so it is likely not statistically significant.) Theoretically, it does make retrieving the scoreboard a tiny bit faster as there is less business logic to execute. In practice, when running
I see:
(Again there is a lot of noise, so it is likely not statistically significant.) |
I guess you could make it faster by keeping the sort keys smaller. For example for the standard ICPC scoring, we don't need decimals, so the string length can be roughly cut in half. |
I tried this, also with ints, and there is no difference in timing. |
Progress towards #2525.
The basic idea is to replace a lot of scattered scoring logic (which
might become more complex when we introduce more scoring options such as
optimization problems) by encoding sufficient information within the
existing rank cache.
The information is encoded into a sort key that is designed to be sorted
descendingly. The sort key is a tuple (serialized as string into the
database) of fixed precision decimals. Each tuple uses 9 decimals and is
left padded with
0
s, enabling sorting via standard database queryoperations. For ascendingly sorted tuple entries (e.g. penalty time),
values are subtracted from a very large constant - this ensures that
that the key can be sorted as a whole.
For example, with ICPC scoring, the top 3 teams from NWERC 2024 receive
these sort keys:
This mechanism should facilitate the implementation of other planned
scoring methods, particularly partial scoring and optimization problems,
with reasonable complexity in the business logic.
We are using
bcmath
with fixed precision to avoid numerical precisionissues caused by the order of floating point operations. While not
critical currently, this will be essential when handling non-integers,
such as those in optimization problems.
The new mechanism also caches some computations and thus improves
scoreboard computation efficiency.