Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed tokenizing multiline template strings #169

Closed
janus926 opened this issue Nov 24, 2014 · 15 comments
Closed

Failed tokenizing multiline template strings #169

janus926 opened this issue Nov 24, 2014 · 15 comments

Comments

@janus926
Copy link

Here's the code for reproducing the issue:

var acorn = require("acorn");

var uglyJS = "var template=document.createElement('template');template.innerHTML=`<style scoped>\n\
    @import url(${stylesheet});</style>\n\
    <content></content>`;"

var token;
var getToken = acorn.tokenize(uglyJS, {
  locations: true,
  ecmaVersion: 6
});
while (true) {
  token = getToken();
  console.log('ttk=' + token.type.keyword + ' ttt=' + token.type.type + ' tv=' + token.value);
  if (token.type.type == 'eof')
    break;
}

Running the script got this:

SyntaxError: Unexpected character '@' (2:4)
at raise (/home/ting/w/fx/os/node_modules/acorn/acorn.js:329:15)
at readToken (/home/ting/w/fx/os/node_modules/acorn/acorn.js:912:7)
at getToken (/home/ting/w/fx/os/node_modules/acorn/acorn.js:222:7)
at Object. (/home/ting/w/fx/os/bug-1102799.js:13:11)
at Module._compile (module.js:456:26)
at Object.Module._extensions..js (module.js:474:10)
at Module.load (module.js:356:32)
at Function.Module._load (module.js:312:12)
at Function.Module.runMain (module.js:497:10)
at startup (node.js:119:16)

@RReverser
Copy link
Member

Will fix, thanks. Same thing that prevented me from adding template strings to loose parser yet.

@janus926
Copy link
Author

Great, thank you :)

@RReverser
Copy link
Member

Well, I investigated if a bit and a problem is that template literal tokens require external state on consumer (i.e. parser) side and without this state (particularly inTemplate flag) will obviously fail. Would it be sufficient for you if tokenizer would return token literals as regular string tokens?

  • Pros: it wouldn't fail without external state
  • Cons: it wouldn't return internals of template string.

As far as I understood from discussion in Bugzilla, you need tokenizer just for beautifying code, so this should work for your case pretty well.

@janus926
Copy link
Author

I am not so sure is acorn used only for beautifying, I just commented on bugzilla to have double confirm.

@RReverser
Copy link
Member

Yeah, I'm tracking and replying in that issue as well :)

@RReverser
Copy link
Member

So could you please tell if such solution would be sufficient for you?

@marijnh
Copy link
Member

marijnh commented Dec 11, 2014

It wouldn't do for the loose parser, though. Would it be possible to have the tokenizer independently track 'embed depth' (by making inTemplate an integer, for example), and use that to get the tokenizing right without help from a parser?

@marijnh
Copy link
Member

marijnh commented Dec 11, 2014

@RReverser Is it okay if I start hacking at the code for a bit now, or are you looking into this at the moment? (Don't want to duplicate work.)

@marijnh
Copy link
Member

marijnh commented Dec 11, 2014

This appears to be fixed by attached patch. Note that this changes the token structure, so if you wrote code against the old token style, which it sounds like you did, you'll have to update it -- there are now acorn.tokTypes.template and acorn.tokTypes.templateContinued tokens with {raw, cooked} objects as value, which span the whole template fragments (including backticks and braces).

(I decided to break compatibility with the old tokens without any safety nets, since the old tokens had only existed in a situation where template tokenizing was too broken to use anyway.)

@RReverser
Copy link
Member

Just got home. @marijnh Interesting solution, thanks for fixing it!

The only thing I don't like is changes to TemplateElement locations which breaks both compatibility and expectations by including quotes in the cases when TemplateElement is on the one or another side of template string.

Also I have a question - how will this work with loose parser (or other tokenizer consumer) in cases when it will call fetchToken.jumpTo(...) and so templates context will become irrelevant?

@marijnh
Copy link
Member

marijnh commented Dec 11, 2014

We include quotes in the location data for strings. I don't see a reason not to do so for template elements.

As for jumpTo, there my idea was to simply keep the existing state and hope for the best. I did implement template strings in the loose parser in 91e5ac0

@RReverser
Copy link
Member

Because TemplateElement is not TemplateLiteral itself, it's just part of it, so it's different case than with strings. For example, template literal: (ugh, those backquotes - can't inline with Markdown)

`My name is ${name} and I am ${title}.`

contains three template elements (== static parts):

  • My name is
  • and I am
  • .

and with current implementation we have quotes included into range of My name is and . while they are just the same static subparts as and I am in the middle:

  • 'My name is

  • and I am

  • .'

    So IMO it doesn't make sense to have different treatment for locations of parts on the sides of template literal and those in the middle, otherwise we get inconsistency here and can't rely that TemplateElement.range[start..end] includes only string itself and nothing more.

@RReverser
Copy link
Member

Regarding template literals in loose parser - I saw that it works, was just wondering about error-recovery cases when jumpTo might occur. However, template literal is basically multi-line string so as soon as we are inside static part of it, we have no way to determine whether we have got a syntax error or if it's just a string. So probably jumpTo from inside of it will never occur. (let's hope so)

marijnh added a commit that referenced this issue Dec 11, 2014
@marijnh
Copy link
Member

marijnh commented Dec 11, 2014

That's a reasonable point I guess. It's somewhat awkward to do, now that the start of the TemplateElement is no longer a token boundary, but I've implemented it in the attached patch.

@RReverser
Copy link
Member

now that the start of the TemplateElement is no longer a token boundary

Yeah, that's why initially I treated back quote as separate token (so boundaries of template element token and node matched). Thanks for fixing it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants