Regular Expressions
A regular expression is a special sequence of characters that helps you match or find other strings or sets of strings, using a specialized syntax held in a pattern.
Regex examples
A simple example for a regular expression is a (literal) string. For example, the Hello World regex matches the "Hello World" string. .
(dot) is another example for a regular expression. A dot matches any single character; it would match, for example, "a" or "1".
The following tables lists several regular expressions and describes which pattern they would match.
Regex | Matches |
---|---|
this is text |
Matches exactly "this is text" |
this\s+is\s+text |
Matches the word "this" followed by one or more whitespace characters followed by the word "is" followed by one or more whitespace characters followed by the word "text". |
^\d+(\.\d+)? |
^ defines that the patter must start at beginning of a new line. \d+ matches one or several digits. The ? makes the statement in brackets optional. \. matches ".", parentheses are used for grouping. Matches for example "5", "1.5" and "2.21". |
Common matching symbols
Regular Expression | Description |
---|---|
|
Matches any character |
|
Finds regex that must match at the beginning of the line. |
|
Finds regex that must match at the end of the line. |
|
Set definition, can match the letter a or b or c. |
|
Set definition, can match a or b or c followed by either v or z. |
|
When a caret appears as the first character inside square brackets, it negates the pattern. This pattern matches any character except a or b or c. |
|
Ranges: matches a letter between a and d and figures from 1 to 7, but not d1. |
|
Finds X or Z. |
|
Finds X directly followed by Z. |
|
Checks if a line end follows. |
Meta characters
The following meta characters have a pre-defined meaning and make certain common patterns easier to use, e.g., \d
instead of [0..9]
.
Regular Expression | Description |
---|---|
|
Any digit, short for |
|
A non-digit, short for |
|
A whitespace character, short for |
|
A non-whitespace character, short for |
|
A word character, short for |
|
A non-word character |
|
Several non-whitespace characters |
|
Matches a word boundary where a word character is |
These meta characters have the same first letter as their representation, e.g., digit, space, word, and boundary. Uppercase symbols define the opposite. |
Quantifier
A quantifier defines how often an element can occur. The symbols ?, *, + and {} define the quantity of the regular expressions
Regular Expression | Description | Examples |
---|---|---|
|
Occurs zero or more times, is short for |
|
|
Occurs one or more times, is short for |
|
|
Occurs no or one times, |
|
|
Occurs X number of times, |
|
|
Occurs between X and Y times, |
|
|
|
|
Negative look ahead
Negative look ahead provides the possibility to exclude a pattern. With this you can say that a string should not be followed by another string. Negative look ahead are defined via (?!pattern). For example, the following will match "a" if "a" is not followed by "b": a(?!b)
Specifying modes inside the regular expression
You can add the mode modifiers to the start of the regex. To specify multiple modes, simply put them together as in (?ismx).
- (?i) makes the regex case insensitive.
- (?s) for "single line mode" makes the dot match all characters, including line breaks.
- (?m) for "multi-line mode" makes the caret and dollar match at the start and end of each line in the subject string.
Backslashes
The backslash \ is an escape character. That means backslash has a predefined meaning. You have to use double backslash \\ to define a single backslash.