Document token spacing

I think the behavior around token splitting should be documented in some way, though I'm uncertain exactly how. The general idea is that the lexer chapter defines tokens, such as `<-`, which are then split in certain situations to smaller tokens. AFAIK, the reference doesn't really define how this works. I also don't know how much of this is part of the lexer versus the parser. I suspect it needs to be addressed in both depending on which approach we take.

A question I have is whether it should be documented as tokens being split, or tokens being glued, since I could imagine either approach is viable.

I think in rustc, currently the primary method that is used to handle this is [`break_and_eat`](https://github.com/rust-lang/rust/blob/91ee6a4057ce4bf1ab6d2f932cae497488d67c81/compiler/rustc_parse/src/parser/mod.rs#L745-L775). However, I may be misremembering if there are other places where this is handled.

IIUC, the places where this splitting happens is somewhat arbitrary, in terms of places where people thought it would be helpful. There's nothing that prevents the parser from avoiding those splitting methods.

There is a brief mention in [`macro.proc.token.conversion.from-proc_macro`](https://doc.rust-lang.org/nightly/reference/procedural-macros.html#r-macro.proc.token.conversion.from-proc_macro) how proc-macro output is glued/split. The proc-macro [`Spacing`](https://doc.rust-lang.org/nightly/proc_macro/enum.Spacing.html) type briefly touches on it as well.

Note that [reference patterns](https://doc.rust-lang.org/nightly/reference/patterns.html#reference-patterns) and [borrow operators](https://doc.rust-lang.org/nightly/reference/expressions/operator-expr.html#borrow-operators) specifically include the `&&` token in their grammar. I don't remember the exact details about why we did it that way (that is, why did we do this just for `&&` and no other token?). Depending on how we approach this, we may need to consider how that might need to change.

Note that I included some of the jointed tokens in [`lex.token.punct.intro`](https://doc.rust-lang.org/nightly/reference/tokens.html?highlight=lex.token.punct.intro#r-lex.token.punct.intro). We may want to remove some. For example, I have `>>=` and `>=` linking to generics.

Some relevant links:
- https://github.com/rust-lang/rust/issues/47856
- https://github.com/rust-lang/rust/issues/64242
- https://github.com/rust-lang/rust/pull/51068
- https://github.com/rust-lang/rust/pull/145536


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Document token spacing #1983

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Document token spacing #1983

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions