-
Notifications
You must be signed in to change notification settings - Fork 546
Description
I think the behavior around token splitting should be documented in some way, though I'm uncertain exactly how. The general idea is that the lexer chapter defines tokens, such as <-
, which are then split in certain situations to smaller tokens. AFAIK, the reference doesn't really define how this works. I also don't know how much of this is part of the lexer versus the parser. I suspect it needs to be addressed in both depending on which approach we take.
A question I have is whether it should be documented as tokens being split, or tokens being glued, since I could imagine either approach is viable.
I think in rustc, currently the primary method that is used to handle this is break_and_eat
. However, I may be misremembering if there are other places where this is handled.
IIUC, the places where this splitting happens is somewhat arbitrary, in terms of places where people thought it would be helpful. There's nothing that prevents the parser from avoiding those splitting methods.
There is a brief mention in macro.proc.token.conversion.from-proc_macro
how proc-macro output is glued/split. The proc-macro Spacing
type briefly touches on it as well.
Note that reference patterns and borrow operators specifically include the &&
token in their grammar. I don't remember the exact details about why we did it that way (that is, why did we do this just for &&
and no other token?). Depending on how we approach this, we may need to consider how that might need to change.
Note that I included some of the jointed tokens in lex.token.punct.intro
. We may want to remove some. For example, I have >>=
and >=
linking to generics.
Some relevant links: