Skip to content

Commit

Permalink
Merge pull request idris-lang#4805 from martinbaker/parseLibDoc2
Browse files Browse the repository at this point in the history
Extend and Fix Parser Library Documentation
  • Loading branch information
melted authored Feb 1, 2020
2 parents 9074a76 + 161573f commit fc0acd9
Show file tree
Hide file tree
Showing 4 changed files with 353 additions and 173 deletions.
14 changes: 7 additions & 7 deletions docs/parserLibrary/introduction.rst
Original file line number Diff line number Diff line change
Expand Up @@ -25,21 +25,21 @@ complicated parser can be made out of many smaller ones.
- Lexer - This takes the input string and turns it into a list of Tokens.
- Parser - This takes the list of tokens and outputs the code.

.. image:: ../image/parserTopLevel.png
:width: 307px
:height: 76px
:alt: diagram illustrating these stages of lexer and parser
- .. image:: ../image/parserTopLevel.png
:width: 307px
:height: 76px
:alt: diagram illustrating these stages of lexer and parser

The advantage of using two stages, like this, is that things like whitespace and
comments don't need to be considered in every parser rule.

The Idris parser library differs from Parsec in that you need to say in the
Recogniser whether a rule is guaranteed to consume input. This means that the
The Idris parser library differs from Parsec in that you need to say, in the
Recogniser, whether a rule is guaranteed to consume input. This means that the
Idris type system prevents the construction of recognisers/rules that can't
consume the input.

The Lexer is similar but works at the level of characters rather than tokens, and
you need to provide a TokenMap which says how to build a token representation from
you need to provide a TokenMap which defines how to build a token from
a string when a rule is matched.

The following pages contain a tutorial for the :ref:`parserLibraryLexer` and a
Expand Down
44 changes: 25 additions & 19 deletions docs/parserLibrary/lexer.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,13 +10,16 @@ module:
lex : TokenMap a -> String -> (List (TokenData a), (Int, Int, String))
This function takes a String and returns a list of tokens. With the tokens we
have indexes back to the original string which can be used in error messages:
.. list-table::

* - This function takes a String and returns a list of tokens. With the
tokens we have indexes back to the original string which can be used
in error messages:

.. image:: ../image/tokenise.png
:width: 330px
:height: 73px
:alt: diagram illustrating these stages of lexer
- .. image:: ../image/tokenise.png
:width: 330px
:height: 73px
:alt: diagram illustrating these stages of lexer

TokenMap
--------
Expand All @@ -34,21 +37,24 @@ this information:
(Lexer, String -> tokenType)
So for each Lexer in the list, if a substring in the input matches, run
the associated function to produce a token of type `tokenType`
the associated function to produce a token of type ``tokenType``

from Text.Lexer.Core:
.. list-table::

.. code-block:: idris
* - from Text.Lexer.Core
: https://github.com/idris-lang/Idris-dev/blob/master/libs/contrib/Text/Lexer/Core.idr

TokenMap : (tokenType : Type) -> Type
TokenMap tokenType = List (Lexer, String -> tokenType)
- .. code-block:: idris

We can create a tokenMap by using a function like this:
TokenMap : (tokenType : Type) -> Type
TokenMap tokenType = List (Lexer, String -> tokenType)

.. code-block:: idris
* - We can create a tokenMap by using a function like this:

- .. code-block:: idris

myTokenMap : TokenMap Token
myTokenMap = [(is 'a',CharLit)]
myTokenMap : TokenMap Token
myTokenMap = [(is 'a',CharLit)]

Once we have a TokenMap we can use it to lex many strings.

Expand All @@ -74,7 +80,7 @@ combined, for example,

.. list-table::

* - <+> means sequence two recognisers. If either consumes a character,
* - ``<+>`` means sequence two recognisers. If either consumes a character,
the sequence is guaranteed to consume a character.

- .. code-block:: idris
Expand All @@ -83,7 +89,7 @@ combined, for example,
SeqEat (Pred (\ARG => intToBool (prim__eqChar ARG 'a')))
(Delay (is 'b')) : Recognise True
* - <|> means if both consume, the combination is guaranteed
* - ``<|>`` means if both consume, the combination is guaranteed
to consumer a character:

- .. code-block:: idris
Expand Down Expand Up @@ -161,7 +167,7 @@ able to parse simple arithmetic expressions, like this:
so we need:

- Numbers (for now integer literals are good enough).
- Some operators (for now '+', '-' and '*' will do.
- Some operators (for now ``+``, ``-`` and ``*`` will do.
- Opening and closing Parentheses.

We can define these, as tokens, like this:
Expand All @@ -178,7 +184,7 @@ We can define these, as tokens, like this:
| EndInput
It may help with debugging and to implement error messages to
implement 'show' for these tokens:
implement ``show`` for these tokens:

.. code-block:: idris
Expand Down
Loading

0 comments on commit fc0acd9

Please sign in to comment.