2.14. Syntax of Regular Expressions

Content

1. Introduction

2. Simple matches

3. Escape sequences

4. Character classes

5. Metacharacters

5.1. Metacharacters - line separators
5.2. Metacharacters - predefined classes
5.3. Metacharacters - word boundaries
5.4. Metacharacters - iterators
5.5. Metacharacters - alternatives
5.6. Metacharacters - subexpressions
5.7. Metacharacters - backreferences

6. Assertions (lookahead and lookbehind assertions)

7. Non-capturing groups

8. Atomic groups

9. Unicode categories

10. Modifiers

Double Commander uses the free library TRegExpr by Andrey Sorokin.

Most of the explanations are from the help file for this library.

1. Introduction

Regular Expressions are a widely-used method of specifying patterns of text to search for. Special characters (metacharacters) allow us to specify, for instance, that a particular string we are looking for occurs at the beginning or end of a line, or contains n recurrences of a certain character or group of characters.

Double Commander supports regular expressions in the following functions:

Find files (file names or content search)
In the Multi-Rename Tool
In the internal editor
In the internal viewer

The TRegExp library supports two modes of operation: ANSI and Unicode. When searching in text files, Double Commander uses both (depending on the file encoding). When searching by name, Unicode is used.

Example of simple match
Expression	Result
foobar	matches string `foobar`
\^FooBarPtr	matches `^FooBarPtr`

Escape sequences
Expression	Result
\xnn	char with hex code `nn`
\x{nnnn}	char with hex code `nnnn` (one byte for plain text and two bytes for Unicode)
\t	tab (HT/TAB), same as `\x09`
\n	newline (NL/LF), same as `\x0a`
\r	carriage return (CR), same as `\x0d`
\f	form feed (FF), same as `\x0c`
\a	alarm (bell) (BEL), same as `\x07`
\e	escape (ESC), same as `\x1b`

Example of escape sequences
Expression	Result
foo\x20bar	matches `foo bar` (note space in the middle)
\tfoobar	matches `foobar` predefined by tab

Character classes
Expression	Result
[-az]	matches `a`, `z` and `-`
[az-]	matches `a`, `z` and `-`
[a\-z]	matches `a`, `z` and `-`
[a-z]	matches all twenty six small characters from `a` to `z`
[\n-\x0D]	matches any of `\x10`, `\x11`, `\x12`, `\x13`
[\d-t]	matches any digit, `-` or `t`
[]-a]	matches any char from `]`..`a`

Example of character classes
Expression	Result
foob[aeiou]r	finds strings `foobar`, `foober` etc. but not `foobbr`, `foobcr` etc.
foob[^aeiou]r	find strings `foobbr`, `foobcr` etc. but not `foobar`, `foober` etc.

Line separators
Expression	Result
^	start of line
$	end of line
\A	start of text
\Z	end of text
.	any character in line

Example with line separators
Expression	Result
^foobar	matches string `foobar` only if it's at the beginning of line
foobar$	matches string `foobar` only if it's at the end of line
^foobar$	matches string `foobar` only if it's the only string in line
foob.r	matches strings like `foobar`, `foobbr`, `foob1r` and so on

Predefined classes
Expression	Result
\w	an alphanumeric character (including `_`), i.e. `[0-9A-Za-z_]`
\W	a nonalphanumeric
\d	a numeric character
\D	a non-numeric
\s	any space (same as `[ \t\n\r\f]`)
\S	a non space

Example of predefined classes
Expression	Result
foob\dr	matches strings like `foob1r`, `foob6r` and so on but not `foobar`, `foobbr` and so on
foob[\w\s]r	matches strings like `foobar`, `foob r`, `foobbr` and so on but not `foob=r` and so on

Word boundaries
Expression	Result
\b	match a word boundary
\B	match a non-(word boundary)

Iterators
Expression	Result
*	zero or more ("greedy"), similar to `{0,}`
+	one or more ("greedy"), similar to `{1,}`
?	zero or one ("greedy"), similar to `{0,1}`
{n}	exactly `n` times ("greedy")
{n,}	at least `n` times ("greedy")
{n,m}	at least `n` but not more than `m` times ("greedy")
*?	zero or more ("non-greedy"), similar to `{0,}?`
+?	one or more ("non-greedy"), similar to `{1,}?`
??	zero or one ("non-greedy"), similar to `{0,1}?`
{n}?	exactly `n` times ("non-greedy")
{n,}?	at least `n` times ("non-greedy")
{n,m}?	at least `n` but not more than `m` times ("non-greedy")

Examples of backreferences
Expression	Result
(.)\1+	matches `aaaa` and `cc`
(.+)\1+	also match `abab` and `123123`
(['"]?)(\d+)\1	matches `"13"` (in double quotes), or `'4'` (in single quotes) or `77` (without quotes) etc

Non-capturing groups
Expression	Result
(https?\|ftp)://([^/\r\n]+)	in `https://doublecmd.sourceforge.io` matches `https` and `doublecmd.sourceforge.io`
(?:https?\|ftp)://([^/\r\n]+)	in `https://doublecmd.sourceforge.io` matches only `doublecmd.sourceforge.io`

Unicode categories
Category	Description
L	Letter
Lu	Uppercase Letter
Ll	Lowercase Letter
Lt	Titlecase Letter
Lm	Modifier Letter
Lo	Other Letter

M	Mark
Mn	Non-Spacing Mark
Mc	Spacing Combining Mark
Me	Enclosing Mark

N	Number
Nd	Decimal Digit Number
Nl	Letter Number
No	Other Number

P	Punctuation
Pc	Connector Punctuation
Pd	Dash Punctuation
Ps	Open Punctuation
Pe	Close Punctuation
Pi	Initial Punctuation
Pf	Final Punctuation
Po	Other Punctuation

S	Symbol
Sm	Math Symbol
Sc	Currency Symbol
Sk	Modifier Symbol
So	Other Symbol

Z	Separator
Zs	Space Separator
Zl	Line Separator
Zp	Paragraph Separator

C	Other
Cc	Control

Example of iterators
Expression	Result
foob.*r	matches strings like `foobar`, `foobalkjdflkj9r` and `foobr`
foob.+r	matches strings like `foobar`, `foobalkjdflkj9r` but not `foobr`
foob.?r	matches strings like `foobar`, `foobbr` and `foobr` but not `foobalkj9r`
fooba{2}r	matches the string `foobaar`
fooba{2,}r	matches strings like `foobaar`, `foobaaar`, `foobaaaar` etc.
fooba{2,3}r	matches strings like `foobaar`, or `foobaaar` but not `foobaaaar`

Example of alternatives
Expression	Result
foo(bar\|foo)	matches strings `foobar` or `foofoo`

Subexpressions
Expression	Result
(foobar){8,10}	matches strings which contain 8, 9 or 10 instances of the `foobar`
foob([0-9]\|a+)r	matches `foob0r`, `foob1r` , `foobar`, `foobaar`, `foobaar` etc.

Examples of Perl extensions
Expression	Result
(?i)Saint-Petersburg	matches `Saint-petersburg` and `Saint-Petersburg`
(?i)Saint-(?-i)Petersburg	matches `Saint-Petersburg` but not `Saint-petersburg`
(?i)(Saint-)?Petersburg	matches `Saint-petersburg` and `saint-petersburg`
((?i)Saint-)?Petersburg	matches `saint-Petersburg` but not `saint-petersburg`