Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use COMP to ensure big-endian binary #267

Open
GitMensch opened this issue Jan 6, 2025 · 2 comments
Open

use COMP to ensure big-endian binary #267

GitMensch opened this issue Jan 6, 2025 · 2 comments

Comments

@GitMensch
Copy link
Contributor

Background: COMP-5 (and the BINARY-.... usages) defines always "machine-native" binary integers, being stored as little-endian on PC; the Minecraft protocol reasonably uses "network order" (big-endian).

Currently the encoding functions use FUNCTION REVERSE to embed the integers with a switched byte-order. Despite portability problems (that just breaks any big-endian machine and would need conditional compilation, see #211) this is also something to consider to adjust for performance reasons.

Any intrinsic function needs to create an intermediate field of appropriate size, then copy the resulting data in.
Without doing a proper perf analysis I'm already quite confident that the actual reversing is very fast (because the sizes are <=8 bytes) and the creation of the intermediate fields, while fast (and again faster in 3.2 than 3.1) takes the most time of those MOVE statements.

Possible solutions:

  • define an additional WORKING-STORAGE field matching the BINARY-... field's size in its PICTURE clause, but with USAGE COMP; use a plain MOVE to get a guaranteed big-endian binary, which can be directly used in a second MOVE (conditional compilation could already emit that move on big-endian, but until there are actual "users" on those environments I'd suggest to just add an inline-comment about that for now)
  • at least if the integers that are encoded are normally "set / computed" then directly "encoded", it could be more beneficial to define the origin as COMP instead of BINARY - on little-endian machines GnuCOBOL will use byte-swapping to get the integer value, do the computation, then use a byte-swapping assignment to the variable; those "inline" byte-swapping functions are very fast and their additional cpu-usage would be much less than a separate variable that needs to be assigned for the encode functions

If there are many computations/checks on those variables, then the solution 1 (which is also much less change) would be preferable; if the variables are nearly always encoded after its use, then solution 2 should be faster.
The good thing: there can be an easy and scoped change (solution 1) for now, and later on a good reproducible test could be defined that can be profiled / run through perf before solution 2 is applied "for testing purposes", then be rechecked for its effect before a decision is made to drop the change or keep it.

Also note that internally GnuCOBOL uses byte-swapping internals (as available in the C runtime / compiler. and only if necessary), which are much more optimized in the C runtime (possibly in cpu code) than moving single bytes around.

@meyfa
Copy link
Owner

meyfa commented Jan 6, 2025

Current implementation:

IDENTIFICATION DIVISION.
PROGRAM-ID. Encode-Int.

DATA DIVISION.
WORKING-STORAGE SECTION.
    01 BYTE-COUNT           BINARY-CHAR UNSIGNED            VALUE 4.
    01 VALUE-BYTES.
        02 VALUE-NUMERIC    BINARY-LONG.
LINKAGE SECTION.
    01 LK-VALUE-IN          BINARY-LONG.
    01 LK-VALUE-OUT         PIC X ANY LENGTH.
    01 LK-OUT-OFFSET        BINARY-LONG UNSIGNED.

PROCEDURE DIVISION USING LK-VALUE-IN LK-VALUE-OUT LK-OUT-OFFSET.
    MOVE LK-VALUE-IN TO VALUE-NUMERIC
    MOVE FUNCTION REVERSE(VALUE-BYTES) TO LK-VALUE-OUT(LK-OUT-OFFSET:BYTE-COUNT)
    ADD BYTE-COUNT TO LK-OUT-OFFSET
    GOBACK.

END PROGRAM Encode-Int.

Implementation with COMP:

  • replace 02 VALUE-NUMERIC BINARY-LONG with 02 VALUE-NUMERIC PIC S9(9) USAGE COMP
  • replace MOVE FUNCTION ... with plain MOVE VALUE-BYTES TO LK-VALUE-OUT(LK-OUT-OFFSET:BYTE-COUNT)

Running both versions 100,000,000 times:

  • current version takes approx. 7.7 seconds
  • COMP version takes approx. 4.7 seconds

Seems like a good 40% time save 👍

@GitMensch
Copy link
Contributor Author

GitMensch commented Jan 6, 2025

While you are at it, I'd suggest to drop
01 BYTE-COUNT BINARY-CHAR UNSIGNED VALUE 4.
and instead use
ADD LENGTH OF VALUE-BYTES TO LK-OUT-OFFSET
(that is an internal register and on GnuCOBOL and IBM replaced at compile time by - in this case - 4); this helps to have a nearly identical code - if you want to have it you can have a copybook that you include multiple times just COPY "EncInt" REPLACE ==prog-id== BY ==Encode-Int==, ==digits== by ==9==..

... but as you need that constant twice, possibly still create a level 78 for that.

That's the "non-OO OO way" :-) [and as in OO or when programming with templates, not necessarily "easy to understand"].

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants