-
Notifications
You must be signed in to change notification settings - Fork 231
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support dedenting long-strings with Windows EOLs #1534
Support dedenting long-strings with Windows EOLs #1534
Conversation
I should have added that I was prompted to look into this again after it was reported in pyrmont/documentarian#13 that a template I define using a long-string was producing unexpected output on Windows. There are workarounds to this particular issue but, from my perspective, it would be preferable to allow Windows users to enjoy the same dedenting behaviour that is available on other platforms. |
I understand the frustration here, but the correct solution is to make sure your source control is not configured to mangle source files.
This is a far bigger problem. Even if the dedenting algorithm is updated in this particular case, you are still actually emitting string constants on Windows and Non-Windows platform, as we don't discard newlines, just move the whitespace around them. Fixing dedenting here is like hot gluing the bumper back on after you get into a wreck so you can drive home. Languages like C that basically treat all whitespace the same get around this by simply not caring, but even then, it is a problem. Let's say as part of our build process we take a hash of our source code (we do this in Janet) and embed it in the binary - with the conversion behavior, we get different hashes on Windows and everything else! You can add a EDIT: Adding
to .gitattributes should prevent this |
I would like to suggest that I am curious what the use case is for needing a binary embedded hash to match one running in a different OS? Edit: as I was thinking about it, the first time I encountered long string indenting, I actually just assumed that was the default as the documentation does say "all bytes will be parsed literally until the ending delimiter is found" so this made sense that the indents would translate Edit Edit: Humorously, I saw a rant on reddit from seven years ago suggesting that Git is last bastion of CRLF issues on Windows |
@bakpakin wrote:
The circumstances in which the issue becomes visible to non-Windows users is because of Git's default behaviour. I agree that if this were the fundamental issue, the correct solution would be to address Git's behaviour. But this isn't the fundamental issue. If a Windows user creates their own project in Janet, never clones any source files and simply writes an indented long-string, they'll encounter this problem. Janet dedenting long-strings is one of those lovely quality of life decisions you've made that I sincerely believe all users should be able to enjoy regardless of the EOL sequence their platform uses. Janet's Windows support is absolutely first class. I've moved away from Windows since growing up with it as a teenager but I remember trying to use scripting languages like Ruby and PHP and being bitterly disappointed how frequently things simply didn't work. I would have loved Janet back then. I think if we can maintain that support at minimal impact to the runtime's source we should. |
I guess the only reason not to make this change would be change existing code on windows, but frankly it seems like an innocuous change in almost all cases. And there could also be users who reject \n for newlines and insist on \r\n everywhere (perhaps they are working with HTTP and cgi). For completeness, perhaps we should also recognize \r newlines. |
This PR fixes the dedenting behaviour of long-strings that use Windows EOLs (i.e.
\r\n
). It fixes #1502.Background
Janet supports dedenting long-strings (i.e. strings delimited by one or more
`
characters). This is particularly useful for docstrings where the docstring will usually be indented to preserve visual hierarchy.As reported by @CFiggers in #1502, this practice ‘breaks’ with files that use
\r\n
as the EOL sequence. This is because Janet’s dedenting algorithm assumes the EOL sequence will be\n
:janet/src/core/parse.c
Lines 361 to 407 in 4daecc9
By default, Git will use
\r\n
when cloning repositories on Windows even if the files in the repository used only\n
. Git’s behaviour can be modified but requires affirmative action from the user.Implementation
The dedenting algorithm in the
src/core/parse.c
file is modified to support\r\n
as an EOL sequence. In essence, this is done by including a test after the for-loop that checks indentation. If the for-loop ended because it hit a\r
character,reindent
is set back to1
(i.e.true
) if the next character is a\n
so that the while-loop continues.The indenting tests that are in the test suite are expanded to ensure that long-strings that use
\r\n
as an EOL sequence work as intended.Notes
As noted by @sogaiu in #1302, different approaches could be taken to normalising EOL sequences. This PR does not do this. If the long-string contained Windows-style EOL sequences, these are preserved in the dedented output.