Skip to content

Commit

Permalink
Adds lookaround assertions closes Raku#2009
Browse files Browse the repository at this point in the history
  • Loading branch information
JJ committed Jul 22, 2019
1 parent 616e6b0 commit 6339327
Showing 1 changed file with 53 additions and 21 deletions.
74 changes: 53 additions & 21 deletions doc/Language/regexes.pod6
Original file line number Diff line number Diff line change
Expand Up @@ -193,12 +193,12 @@ alphabet) match C<\d>, but also digits from other scripts.
Examples for digits are:
=begin code :lang<text>
U+0035 5 DIGIT FIVE
U+0BEB ௫ TAMIL DIGIT FIVE
U+0E53 ๓ THAI DIGIT THREE
U+17E5 ៥ KHMER DIGIT FIVE
=end code
=begin code :lang<text>
U+0035 5 DIGIT FIVE
U+0BEB ௫ TAMIL DIGIT FIVE
U+0E53 ๓ THAI DIGIT THREE
U+17E5 ៥ KHMER DIGIT FIVE
=end code
=head3 X<C<\w> and C<\W>|regex,\w;regex,\W>
Expand Down Expand Up @@ -425,15 +425,15 @@ which takes a single L<Int|/type/Int> or a L<Range|/type/Range> on the right-han
the number of times to match. If L<Range|/type/Range> is specified, the end-points specify
the minimum and maximum number of times to match.
=begin code
say 'abcdefg' ~~ /\w ** 4/; # OUTPUT: «「abcd」␤»
say 'a' ~~ /\w ** 2..5/; # OUTPUT: «Nil␤»
say 'abc' ~~ /\w ** 2..5/; # OUTPUT: «「abc」␤»
say 'abcdefg' ~~ /\w ** 2..5/; # OUTPUT: «「abcde」␤»
say 'abcdefg' ~~ /\w ** 2^..^5/; # OUTPUT: «「abcd」␤»
say 'abcdefg' ~~ /\w ** ^3/; # OUTPUT: «「ab」␤»
say 'abcdefg' ~~ /\w ** 1..*/; # OUTPUT: «「abcdefg」␤»
=end code
=begin code
say 'abcdefg' ~~ /\w ** 4/; # OUTPUT: «「abcd」␤»
say 'a' ~~ /\w ** 2..5/; # OUTPUT: «Nil␤»
say 'abc' ~~ /\w ** 2..5/; # OUTPUT: «「abc」␤»
say 'abcdefg' ~~ /\w ** 2..5/; # OUTPUT: «「abcde」␤»
say 'abcdefg' ~~ /\w ** 2^..^5/; # OUTPUT: «「abcd」␤»
say 'abcdefg' ~~ /\w ** ^3/; # OUTPUT: «「ab」␤»
say 'abcdefg' ~~ /\w ** 1..*/; # OUTPUT: «「abcdefg」␤»
=end code
Only basic literal syntax for the right-hand side of the quantifier
is supported, to avoid ambiguities with other regex constructs. If you need
Expand Down Expand Up @@ -550,16 +550,16 @@ single letter to match the C<\w+> expression at the end of the line.
By default, quantifiers request a greedy match:
=begin code
'abababa' ~~ /a .* a/ && say ~$/; # OUTPUT: «abababa␤»
=end code
=for code
'abababa' ~~ /a .* a/ && say ~$/; # OUTPUT: «abababa␤»
You can attach a C<?> modifier to the quantifier to enable frugal
matching:
=begin code
'abababa' ~~ /a .*? a/ && say ~$/; # OUTPUT: «aba␤»
=end code
=for code
'abababa' ~~ /a .*? a/ && say ~$/; # OUTPUT: «aba␤»
You can also enable frugal matching for general quantifiers:
Expand Down Expand Up @@ -888,6 +888,38 @@ lookahead and lookbehind assertions.
Technically, anchors are also zero-width assertions, and they can look
both ahead and behind.
=head2 X«Lookaround assertions|regex,positive lookaround assertion;regex,negative lookaround assertion»
Lookaround assertions work both ways. They match, but they don't consume a
character.
=begin code
my regex key {^^ <![#-]> \d+ }
say "333" ~~ &key; # OUTPUT: «「333」␤»
say '333$' ~~ m/ \d+ <?[$]>/; # OUTPUT: «「333」␤»
say '$333' ~~ m/^^ <?[$]> . \d+ /; # OUTPUT: «「$333」␤»
=end code
They can be positive or negative: C<![]> is negative, while C<?[]> is
positive; the square braces will include the characters or backslashed
character classes that are going to be matched.
You can use predefined character classes and Unicode properties directly
preceded by the semicolon:
=for code
say '333' ~~ m/^^ <?alnum> \d+ /; # OUTPUT: «「333」␤»
say '333' ~~ m/^^ <?:Nd> \d+ /; # OUTPUT: «「333」␤»
say '333' ~~ m/^^ <!:L> \d+ /; # OUTPUT: «「333」␤»
say '333' ~~ m/^^ \d+ <!:Script<Tamil>> /; # OUTPUT: «「33」␤»
In the first two cases, the character class matches, but does not consume,
the first digit, which is then consumed by the expression; in the third, the
negative lookaround assertion behaves in the same way. In the fourth
statement the last digit is matched but not consumed, thus the match includes
only the first two digits.
=head2 X<Lookahead assertions|regex,before>
To check that a pattern appears before another pattern, use a
Expand Down

0 comments on commit 6339327

Please sign in to comment.