Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tarsnap fails at extracting filenames with brackets #329

Closed
scotchi opened this issue Sep 28, 2018 · 11 comments
Closed

Tarsnap fails at extracting filenames with brackets #329

scotchi opened this issue Sep 28, 2018 · 11 comments

Comments

@scotchi
Copy link

scotchi commented Sep 28, 2018

If you create an archive with files with brackets in it, you can't extract them by name unless you escape them ... even when bypassing the shell.

e.g.

$ mkdir foo
$ touch 'foo/bar[baz]quux'
$ sudo tarsnap -c -f example foo/
                                      Total size  Compressed size
All archives                        2207166128045     591061010434
  (unique data)                       11405443757       2809115624
This archive                                 2491             1409
New data                                     2491             1409
$ sudo tarsnap -tf example
foo/
foo/foo[bar]baz

But then:

$ sudo tarsnap -xvf example 'foo/foo[bar]baz'
tarsnap: foo/foo[bar]baz: Not found in archive
tarsnap: Error exit delayed from previous errors.

It works if you escape the brackets with backslashes, but that doesn't seem like it should be necessary. To make sure it wasn't any shell injected stuff that I wasn't aware of, I did it directly with exec to test:

#include <unistd.h>
#include <stdio.h>

int main()
{
    execlp("/usr/local/bin/tarsnap", "tarsnap", "-tf", "example", "foo/foo[bar]baz", NULL);
    perror(NULL);
    return 0;
}

And there the error persists. (This got reported as a redsnapper issue.)

@gperciva
Copy link
Member

Hmm. This is a difference in behaviour between GNU tar and BSD tar:

$ tar -tf brackets.tar 
aa[bb]cc
$ tar -xf brackets.tar 'aa[bb]cc'
$ bsdtar -xf brackets.tar 'aa[bb]cc'
bsdtar: aa[bb]cc: Not found in archive
bsdtar: Error exit delayed from previous errors.
$ tar --version
tar (GNU tar) 1.28
Copyright (C) 2014 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Written by John Gilmore and Jay Fenlason.
$ bsdtar --version
bsdtar 3.1.2 - libarchive 3.1.2
td@mac: ~/src/tarsnap/build (resume-extract)
$ 

I can't see any issues for "brackets" or "squares" in the libarchive project (even in closed issues and PRs), so apparently this has never come up before? I'll continue to look into libarchive to see if it's feasible to add it there.

@scotchi
Copy link
Author

scotchi commented Sep 28, 2018

What I found interesting (and weird) is that it works if it's escaped ... but I can't imagine what the logic is for escaping things when there's no shell expansion. Does libarchive itself do some kind of wildcard expansion?

@cperciva
Copy link
Member

I believe that we do shell globbing when matching files to extract from an archive.

@gperciva
Copy link
Member

After editing tar/matching.c:

$ ./tarsnap -xf brackets 'aa[bb]cc'
going to call: fnmatch("aa[bb]cc", "aa[bb]cc", FNM_LEADING_DIR)
result: 1
$ man 3 fnmatch
...
RETURN VALUE
       Zero if string matches pattern, FNM_NOMATCH if there  is  no  match  or
       another nonzero value if there is an error.

@gperciva
Copy link
Member

(err, ./tarsnap went on to give the usual errors related to this, as well as the results of other printf() command. I edited those out)

@gperciva
Copy link
Member

Yeah, it's all about fnmatch. Consider this:

#include <fnmatch.h>
#include <stdio.h>

int main()
{
        const char * pattern_easy = "aaa";
        const char * pattern_brackets = "a[b]a";
        char * pattern; 
        int r;

        pattern = (char *)pattern_easy;
        r = fnmatch(pattern, pattern, 0);
        printf("fnmatch(\"%s\", \"%s\", 0) -> %i\n", pattern, pattern, r);

        pattern = (char *)pattern_brackets;
        r = fnmatch(pattern, pattern, 0);
        printf("fnmatch(\"%s\", \"%s\", 0) -> %i\n", pattern, pattern, r);
}

and then:

$ clang fnm.c && ./a.out 
fnmatch("aaa", "aaa", 0) -> 0
fnmatch("a[b]a", "a[b]a", 0) -> 1

@scotchi
Copy link
Author

scotchi commented Sep 28, 2018

Which I guess then leaves the question: should users be expected to escape globbing chars on the way in? (Or perhaps should there be a command line switch which turns them off?)

@gperciva
Copy link
Member

well, GNUtar behaves differently. I'm looking up whether they still use fnmatch and escape the [] themselves, or use a different function.

@gperciva
Copy link
Member

ok, they wrote their own fnmatch.c. I don't think we want to go that route.

Unless @cperciva says otherwise, I think the thing to do here is improve our man-page so that users can find out that the include and exclude patterns have some special characters (and that the extra pathnames for -t and -x are treated the same as --include).

@scotchi
Copy link
Author

scotchi commented Sep 28, 2018

I have gone ahead and worked around this in redsnapper:

directededge/redsnapper@cdc6d37

@gperciva
Copy link
Member

gperciva commented Oct 3, 2018

Hi @scotchi.

Thanks for the report! We now mention [] explicitly in the man-page, so I think we can close this issue.

@gperciva gperciva closed this as completed Oct 3, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants