Failed tokenizing multiline template strings #169

janus926 · 2014-11-24T09:53:50Z

Here's the code for reproducing the issue:

var acorn = require("acorn");

var uglyJS = "var template=document.createElement('template');template.innerHTML=`<style scoped>\n\
    @import url(${stylesheet});</style>\n\
    <content></content>`;"

var token;
var getToken = acorn.tokenize(uglyJS, {
  locations: true,
  ecmaVersion: 6
});
while (true) {
  token = getToken();
  console.log('ttk=' + token.type.keyword + ' ttt=' + token.type.type + ' tv=' + token.value);
  if (token.type.type == 'eof')
    break;
}

Running the script got this:

SyntaxError: Unexpected character '@' (2:4)
at raise (/home/ting/w/fx/os/node_modules/acorn/acorn.js:329:15)
at readToken (/home/ting/w/fx/os/node_modules/acorn/acorn.js:912:7)
at getToken (/home/ting/w/fx/os/node_modules/acorn/acorn.js:222:7)
at Object. (/home/ting/w/fx/os/bug-1102799.js:13:11)
at Module._compile (module.js:456:26)
at Object.Module._extensions..js (module.js:474:10)
at Module.load (module.js:356:32)
at Function.Module._load (module.js:312:12)
at Function.Module.runMain (module.js:497:10)
at startup (node.js:119:16)

RReverser · 2014-11-24T10:08:35Z

Will fix, thanks. Same thing that prevented me from adding template strings to loose parser yet.

janus926 · 2014-11-24T10:18:17Z

Great, thank you :)

RReverser · 2014-12-05T09:34:48Z

Well, I investigated if a bit and a problem is that template literal tokens require external state on consumer (i.e. parser) side and without this state (particularly inTemplate flag) will obviously fail. Would it be sufficient for you if tokenizer would return token literals as regular string tokens?

Pros: it wouldn't fail without external state
Cons: it wouldn't return internals of template string.

As far as I understood from discussion in Bugzilla, you need tokenizer just for beautifying code, so this should work for your case pretty well.

janus926 · 2014-12-10T08:20:54Z

I am not so sure is acorn used only for beautifying, I just commented on bugzilla to have double confirm.

RReverser · 2014-12-10T12:15:41Z

Yeah, I'm tracking and replying in that issue as well :)

RReverser · 2014-12-11T09:53:36Z

So could you please tell if such solution would be sufficient for you?

marijnh · 2014-12-11T10:01:43Z

It wouldn't do for the loose parser, though. Would it be possible to have the tokenizer independently track 'embed depth' (by making inTemplate an integer, for example), and use that to get the tokenizing right without help from a parser?

marijnh · 2014-12-11T10:07:46Z

@RReverser Is it okay if I start hacking at the code for a bit now, or are you looking into this at the moment? (Don't want to duplicate work.)

marijnh · 2014-12-11T13:32:52Z

This appears to be fixed by attached patch. Note that this changes the token structure, so if you wrote code against the old token style, which it sounds like you did, you'll have to update it -- there are now acorn.tokTypes.template and acorn.tokTypes.templateContinued tokens with {raw, cooked} objects as value, which span the whole template fragments (including backticks and braces).

(I decided to break compatibility with the old tokens without any safety nets, since the old tokens had only existed in a situation where template tokenizing was too broken to use anyway.)

RReverser · 2014-12-11T15:44:48Z

Just got home. @marijnh Interesting solution, thanks for fixing it!

The only thing I don't like is changes to TemplateElement locations which breaks both compatibility and expectations by including quotes in the cases when TemplateElement is on the one or another side of template string.

Also I have a question - how will this work with loose parser (or other tokenizer consumer) in cases when it will call fetchToken.jumpTo(...) and so templates context will become irrelevant?

marijnh · 2014-12-11T15:57:37Z

We include quotes in the location data for strings. I don't see a reason not to do so for template elements.

As for jumpTo, there my idea was to simply keep the existing state and hope for the best. I did implement template strings in the loose parser in 91e5ac0

RReverser · 2014-12-11T16:09:15Z

Because TemplateElement is not TemplateLiteral itself, it's just part of it, so it's different case than with strings. For example, template literal: (ugh, those backquotes - can't inline with Markdown)

`My name is ${name} and I am ${title}.`

contains three template elements (== static parts):

My name is
and I am
.

and with current implementation we have quotes included into range of My name is and . while they are just the same static subparts as and I am in the middle:

'My name is
and I am
.'

So IMO it doesn't make sense to have different treatment for locations of parts on the sides of template literal and those in the middle, otherwise we get inconsistency here and can't rely that TemplateElement.range[start..end] includes only string itself and nothing more.

RReverser · 2014-12-11T16:12:47Z

Regarding template literals in loose parser - I saw that it works, was just wondering about error-recovery cases when jumpTo might occur. However, template literal is basically multi-line string so as soon as we are inside static part of it, we have no way to determine whether we have got a syntax error or if it's just a string. So probably jumpTo from inside of it will never occur. (let's hope so)

Issue #169

marijnh · 2014-12-11T16:45:02Z

That's a reasonable point I guess. It's somewhat awkward to do, now that the start of the TemplateElement is no longer a token boundary, but I've implemented it in the attached patch.

RReverser · 2014-12-11T16:48:07Z

now that the start of the TemplateElement is no longer a token boundary

Yeah, that's why initially I treated back quote as separate token (so boundaries of template element token and node matched). Thanks for fixing it!

marijnh closed this as completed in 2cb3dbc Dec 11, 2014

marijnh added a commit that referenced this issue Dec 11, 2014

Give TemplateElements a narrower range

6915519

Issue #169

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Failed tokenizing multiline template strings #169

Failed tokenizing multiline template strings #169

janus926 commented Nov 24, 2014

RReverser commented Nov 24, 2014

janus926 commented Nov 24, 2014

RReverser commented Dec 5, 2014

janus926 commented Dec 10, 2014

RReverser commented Dec 10, 2014

RReverser commented Dec 11, 2014

marijnh commented Dec 11, 2014

marijnh commented Dec 11, 2014

marijnh commented Dec 11, 2014

RReverser commented Dec 11, 2014

marijnh commented Dec 11, 2014

RReverser commented Dec 11, 2014

RReverser commented Dec 11, 2014

marijnh commented Dec 11, 2014

RReverser commented Dec 11, 2014

Failed tokenizing multiline template strings #169

Failed tokenizing multiline template strings #169

Comments

janus926 commented Nov 24, 2014

RReverser commented Nov 24, 2014

janus926 commented Nov 24, 2014

RReverser commented Dec 5, 2014

janus926 commented Dec 10, 2014

RReverser commented Dec 10, 2014

RReverser commented Dec 11, 2014

marijnh commented Dec 11, 2014

marijnh commented Dec 11, 2014

marijnh commented Dec 11, 2014

RReverser commented Dec 11, 2014

marijnh commented Dec 11, 2014

RReverser commented Dec 11, 2014

RReverser commented Dec 11, 2014

marijnh commented Dec 11, 2014

RReverser commented Dec 11, 2014