Regular expression is a group of characters or symbols which is used to find a specific pattern from a text.
A regular expression is a pattern that is matched against a subject string from left to right. The word "Regular expression" is a mouthful, you will usually find the term abbreviated as "regex" or "regexp". Regular expression is used for replacing a text withing a string, validating form, extract a substring from a string based upon a pattern match, and so much more.
Imagine you are writing an application and you want to set the rules when user chosing their username. We want the username can contains letter, number, underscore and hyphen. We also want to limit the number of characters in username so it does not look ugly. We use the following regular expression to validate a username:
Above regular expression can accepts the strings "john_doe", "jo-hn_doe" and "john12_as". It does not match "Jo" because that string contains uppercase letter and also it is too short.
- Basic Matchers
- Meta character
- Quantifiers
- OR operator
- Character Sets
- Shorthand Character Sets
- Grouping
- Lookaheads
- Flags
A regular expression is just a pattern of letters and digits that we used to search in a text. For example the regular expression
cat
means: the letter c
, followed by the letter a
, followed by the letter t
.
"cat" => The cat sat on the mat
The regular expression 123
matches the string "123". The regular expression is matched against an input string by comparing each
character in the regular expression to each character in the input string, one after another. Regular expressions are normally
case-sensitive so the regular expression Cat
would not match the string "cat".
"Cat" => The cat sat on the Cat
Meta characters are the building blocks of the regular expressions. Meta characters do not stand for themselves but instead are interpreted in some special way. Some meta characters have a special meaning that are written inside the square brackets. The meta character are as follows:
Meta character | Description |
---|---|
. | Period matches any single character except a line break. |
[ ] | Character class. Matches any character contained between the square brackets. |
[^ ] | Negated character class. Matches any character that is not contained between the square brackets |
* | Matches 0 or more repetitions of the preceding symbol. |
+ | Matches 1 or more repetitions of the preceding symbol. |
? | Makes the preceding symbol optional. |
{n} | Braces. Matches “n” repetitions of the preceding symbol. |
(xyz) | Character group. Matches the characters xyz in that exact order. |
| | Alternation. Matches either the characters before or the characters after the symbol. |
\ | Escapes the next character. This allows you to match reserved characters [ ] ( ) { } . * + ? ^ $ \ | |
^ | Matches the beginning of the input. |
$ | Matches the end of the input. |
Full stop .
is the simplest example of meta character. The meta character .
matches any single character. It will not match return
or new line characters. For example the regular expression .ar
means: any character, followed by the letter a
, followed by the
letter r
.
".ar" => The car parked in the garage.
Character sets are also called character class. Square brackets are used to specify character sets. Use hyphen inside character set to
specify the characters range. The order of the character range inside square brackets doesn't matter. For example the regular
expression [Tt]he
means: an uppercase T
or lowercase t
, followed by the letter h
, followed by the letter e
.
"[Tt]he" => The car parked in the garage.
In general the caret symbol represents the start of the string, but when it is typed after the opening square bracket it negates the
character set. For example the regular expression [^c]ar
means: any character except c
, followed by the character a
, followed by
the letter r
.
"[^c]ar" => The car parked in the garage.
We can repeat a character class by using +
, *
or ?
operators. For example the regular expression [a-z]+
means: any number of
lowercase letters in a row.
"[a-z]+" => The car parked in the garage.