Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use split_windows_volume_prefix() in split_windows() and new is_absolute_windows(). #5

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

aznashwan
Copy link

@aznashwan aznashwan commented Mar 28, 2024

WARN: this code may NOT run correctly on OTP because of this compiler bug: gleam-lang/gleam#2733

Depends on (and includes) the changes in: #7

This patch adds a new is_absolute_windows() function and also switches split_windows() to use the new split_windows_volume_prefix() function.

BREAKING: Note that split_windows()(and by extension split() on Windows) will no longer auto-lowercase the drive letter, nor include the slash in the first drive element of the split as it used to.

Most notable is the addition of the split_windows_volume_prefix() function for (hopefully) complete coverage of all possible Windows volume prefix formats.

This is mainly a side-effect of it being also converted to use the new split_windows_volume_prefix(), but IMO it should never had taken said steps to begin with, as these behaviors broke the expected tautology that: path == string.join("/", split(path)) should always be true. (presuming the slash orientation is correct)

PS: this is my first real Gleam language patch, so please be as harsh as necessary. 😄

@aznashwan aznashwan force-pushed the windows-improvements branch 2 times, most recently from c97586b to 642f2fe Compare March 29, 2024 00:44
@lpil
Copy link
Owner

lpil commented Mar 29, 2024

Hello! What's the motivation for these changes?
And could you provide references that demonstrate that the new behaviour is correct? Some of the changes mean this library deviates from the references which were being used previously (see the top of the module) which is a little worrying for me.

Thank you.

@aznashwan
Copy link
Author

Hello, and thanks for taking the time!
I presume the main issue needing discussion is the behavioral change to split() on Windows, so I'll try to motivate that.

Motivation

I would argue that there are a bunch of issues with how split() currently works on Windows which makes it "less usable":

  1. string.join(filepath.split(path), with: "/") == path should always be True IMO, it's just one of those tautologies I believe everyone expects when using a filepath splitting function (I do, at least). The current implementation will not do this because:

    1. it lowercase's the drive letter (i.e. split("C:\\abc") == ["c:\\", "abc"]), which I strongly believe it has no business doing. A split function should only split IMO, not modify the input in any other way
    2. it's including the trailing path separator in the first element, which is does not do for the rest of the "middle elements", which will make string.join()-ing the path elements back lead to an extra (back)slash after the drive portion.
    3. even if we decide including the path separator is correct, it's transforming backslashes into regular slashes: filepath.split("C:\\Users") == ["c:/", "Users"]
  2. a filepath.split() function should split a path based on actual path semantics, not based on string semantics.
    Indeed, on Unix systems the path and string semantics happen to be equivalent (/ splits path elements, end of story).
    On Windows, however, the splitting needs to take into account all the possible semantics of the leading volume specifier.
    A person running this on Windows would IMO have zero reason to use filepath.split("//127.0.0.1/share/abc") if the result is identical to just doing a string.split(_, "/") on it, it brings zero added value over the simple string split...

Concession on Erl compatibility

Some of the changes mean this library deviates from the references which were being used previously (see the top of the module)

I admit that I hadn't noticed the references at the start of the file and wasn't aware this was trying to emulate the behavior of existing split() Erlang functions, my apologies!

If the goal was always to maintain behavioral compatibility with the Erl functions, I can gladly revert split() to what it used to be (minus point 1.i which IMHO is absolutely incorrect).

However, I firmly believe we should not let that stop this package from being "as correct as possible".

Conclusion

Even if the priority is compatibility with the Erl function, we should at the very very least fix points 1.i and 1.iii IMO, just let me know!

@aznashwan
Copy link
Author

FWIW, I will add some examples of split_windows() failing:

import gleam/io
import gleam/list
import gleam/string

import filepath

pub fn main() {
    let unc_root = "//./"
    let plain = "//./pipe"
    let slashed = "//./pipe/testpipe"
    let slashed_drive = "C:/"
    let backlashed_drive = "C:\\"
    let backslashed = "\\\\.\\pipe\\testpipe\\test"
    let driveletter = "C:\\Users\\Administrator/test"
    let unc_ip = "\\\\127.0.0.1\\MyShare\\shared-dir\\file.txt"
    let unc_path = "\\\\.\\UNC\\LOCALHOST\\c$\\temp\\test-file.txt"

    let all_test_paths = [
        unc_root, plain, slashed, slashed_drive, backlashed_drive, backslashed,
        driveletter, unc_ip, unc_path
    ]

    let testf = fn(path) {
        io.println("Splitting: '" <> path <> "'")
        io.debug(filepath.split_windows(path))
        io.println("")
    }
    list.map(all_test_paths, testf)
}

This produces this output:

Splitting: '//./'
["/", "", ".", ""]

Splitting: '//./pipe'
["/", "", ".", "pipe"]

Splitting: '//./pipe/testpipe'
["/", "", ".", "pipe", "testpipe"]

Splitting: 'C:/'
["c:/", ""]

Splitting: 'C:\'
["c:/", ""]

Splitting: '\\.\pipe\testpipe\test'
["/", "", ".", "pipe", "testpipe", "test"]

Splitting: 'C:\Users\Administrator/test'
["c:/", "Users", "Administrator", "test"]

Splitting: '\\127.0.0.1\MyShare\shared-dir\file.txt'
["/", "", "127.0.0.1", "MyShare", "shared-dir", "file.txt"]

Splitting: '\\.\UNC\LOCALHOST\c$\temp\test-file.txt'
["/", "", ".", "UNC", "LOCALHOST", "c$", "temp", "test-file.txt"]

@aznashwan
Copy link
Author

aznashwan commented Mar 30, 2024

Additionally FWIW, although I don't believe 1:1 comparisons to the stdlib of other languages is infallible, I'd like point out that Go's path/filepath.Split() function correctly handles these cases.

NOTE: Go's version only splits the path once at its end, returning 2 strings, and it's meant to implement the tautology that:

dir, rest := filepath.Split(path)
path == (dir + rest) // should always be true

Sample:

package main

import "fmt"
import "path/filepath"
import "runtime"

func main() {
	if runtime.GOOS != "windows" {
		panic("These examples are only relevant when running on Windows.")
	}

	fmt.Printf("Go version this was built with is: %s\n\n", runtime.Version())

	examples := []string{
		"C:\\123",
		"c:\\123",
		"C:/123",
		"c:/123",
		"C:\\",
		"C:/",
		"HKLM:\\SOFTWARE",
		"HKLM:/SOFTWARE",
		"\\\\.\\pipe\\test",
		"//./pipe/test",
		"//127.0.0.1/abc/123",
		"\\\\127.0.0.1\\abc\\123",

		// Notable examples where Split() correctly identifies the volume and does NOT split it:
		"//127.0.0.1",
		"//127.0.0.1/",
		"//127.0.0.1/abc",
		"\\\\127.0.0.1\\abc",
		"\\\\.\\pipe",
		"//./pipe",
	}

	for _, example := range examples {
		dir, file := filepath.Split(example)
		fmt.Printf("%q => %#v\n", example, []string{dir, file})
	}
}

Output:

Go version this was built with is: go1.21.1

"C:\\123" => []string{"C:\\", "123"}
"c:\\123" => []string{"c:\\", "123"}
"C:/123" => []string{"C:/", "123"}
"c:/123" => []string{"c:/", "123"}
"C:\\" => []string{"C:\\", ""}
"C:/" => []string{"C:/", ""}
"HKLM:\\SOFTWARE" => []string{"HKLM:\\", "SOFTWARE"}
"HKLM:/SOFTWARE" => []string{"HKLM:/", "SOFTWARE"}
"\\\\.\\pipe\\test" => []string{"\\\\.\\pipe\\", "test"}
"//./pipe/test" => []string{"//./pipe/", "test"}
"//127.0.0.1/abc/123" => []string{"//127.0.0.1/abc/", "123"}
"\\\\127.0.0.1\\abc\\123" => []string{"\\\\127.0.0.1\\abc\\", "123"}

# Notable examples where Split() correctly identifies the volume and does NOT split it:
"//127.0.0.1" => []string{"//127.0.0.1", ""}
"//127.0.0.1/" => []string{"//127.0.0.1/", ""}
"//127.0.0.1/abc" => []string{"//127.0.0.1/abc", ""}
"\\\\127.0.0.1\\abc" => []string{"\\\\127.0.0.1\\abc", ""}
"\\\\.\\pipe" => []string{"\\\\.\\pipe", ""}
"//./pipe" => []string{"//./pipe", ""}

Copy link
Owner

@lpil lpil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thank you.

Could you split this pull request into one per change or fix or new feature please? There's quite a lot of different things in this one which makes it challenging to review and the controversial changes will block the uncontroversial.

Could you also run the formatter and update the changelog in each. Thank you

This patch adds a new public function for splitting Windows paths
into its volume prefix and rest of the path.
Note this introduces the breaking change of `split_windows()` and
`split()` no longer lowercasing the drive letter, nor keeping a
trailing slash in the first element representing the drive.
@aznashwan aznashwan force-pushed the windows-improvements branch from 642f2fe to ceb6a12 Compare April 1, 2024 17:04
@aznashwan aznashwan changed the title Various Windows additions/improvements. Use split_windows_volume_prefix() in split_windows() and new is_absolute_windows(). Apr 1, 2024
@aznashwan aznashwan force-pushed the windows-improvements branch from ceb6a12 to c00fb17 Compare April 1, 2024 17:06
@aznashwan
Copy link
Author

I have went ahead and split the initial PR into:

@aznashwan
Copy link
Author

aznashwan commented Apr 1, 2024

PS: Although I made #7 bump the minor version to 1.1.0, I can't decide whether this patch here warrants another minor bump or a full 2.0.0 release because of the changes to split{_windows}().

IMO the current way split_windows() works is incorrect so I would make the argument this counts as a fix and not warrant a major "feature" bump to 2.0, but I'll defer that decision to you...

Copy link
Owner

@lpil lpil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

edit: oops, sorry I see this is the original PR.

@aznashwan
Copy link
Author

aznashwan commented Apr 1, 2024

Indeed, after #7 merges this PR will be revealed to have minimal changes:

This patch adds a new is_absolute_windows() function and also switches split_windows() to use the new split_windows_volume_prefix() function.

I could split the switching of split_windows() and is_absolute_windows() to separate PRs if you insist but I didn't see much value in that since we're talking like 50LoC anyway...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants