Description
Bug report
Bug description:
It seems that whitespace is allowed everywhere by base64.a85decode
, except after the end-of-data delimiter b'~>'
in adobe
mode:
>>> base64.a85decode(b"6#q'\\F`JTK<-N74;eT`QF!;`!@:O(oDf,~>", adobe=True)
b'Arthur "Two-Sheds" Jackson'
>>> base64.a85decode(b" 6 # q' \\ F`JTK<-N 7 4 ;eT`QF!;`!@:O(oDf,~>", adobe=True)
b'Arthur "Two-Sheds" Jackson'
>>> base64.a85decode(b" 6 # q' \\ F`JTK<-N 7 4 ;eT`QF!;`!@:O(oDf, ")
b'Arthur "Two-Sheds" Jackson'
>>> base64.a85decode(b" 6 # q' \\ F`JTK<-N 7 4 ;eT`QF!;`!@:O(oDf,~> ", adobe=True)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3.11/base64.py", line 388, in a85decode
raise ValueError(
ValueError: Ascii85 encoded byte sequences must end with b'~>'
While this behaviour is actually compliant with the very latest PDF standard, including errata, in practice it's quite surprising, and also causes problems due to the legacy of centuriesdecades of ambiguous PDF standards and implementations that emit and accept extra whitespace due to these amgibuities.
A separate but related issue is that some very broken PDF implementations have even been known to insert whitespace between the ~
and >
bytes. It maybe useful for "Adobe" mode to be tolerant of this as well.
Obviously, also, PostScript doesn't care about extra whitespace after ~>
in ASCII85 literal strings. (Note that the leading <~
is only accepted in PostScript and not in PDF).
Because >
is a valid ASCII85 digit, an improved rule would be to only accept the regular expression ~\s*>\s*
at the end of input in Adobe mode.
CPython versions tested on:
3.11
Operating systems tested on:
Linux
Metadata
Metadata
Assignees
Projects
Status