Regular expressions
Regular expressions are a very powerful search tool. They allow to search for complex classes of words. Regular expressions are mainly meant for professionals, but can also be useful in the office for finding certain documents (see examples below).
Total Commander supports regular expressions in the following functions:
- Commands -
Search (in file name and file contents)
Regular expressions consist of normal characters and special characters, so-called meta-characters. The following characters are meta-characters or initial parts of meta-characters:
. \ ( ) [ ] { } ^ $ + * ? (only in character classes: - )
Normal characters:
test finds the string "test" in the searched text. Note: This finds "test" ANYWHERE in a file name or on a line in text.
Escape sequences:
A backslash \ starts an Escape sequence. Examples for escape sequences:
\t Tabstop
\xnn Character with hexadecimal code nn. Example: \x20 is the space character. The character table charmap.exe (if installed) shows the character code of most special characters. You can use the Windows calculator in scientific mode to convert from decimal to hex.
\x{nnnn}
Unicode character with hexadecimal code nnnn. Note that Total Commander uses Unicode for file names, so you need to use this notation for characters not in your local codepage.
\[ Left square bracket. Since the square brackets are meta-characters, they need to be written as \[ to search for them in the target string.
\\ Finds a backslash.
\. Finds a dot ("." alone finds any character, see below).
Character classes
Characters in square brackets build a character class. It will find exactly one character from this class. A dash allows you to define groups, e.g. [a-z]. A ^ at the beginning finds all characters except for those listed.
Examples:
[aeiou] Finds exactly one of the listed vowels.
[^aeiou] Finds everything except for a vowel.
M[ae][iy]er Finds a Mr. Meier in all possible ways of writing: Mayer, Meyer, Maier, Meier. Very useful if you cannot remember the exact writing of a name.
Meta-characters
Here is a list of the most important meta-characters:
^ Line start
$ Line end
. Any character
\w a letter, digit or underscore _
\W the opposite of \w
\d a digit
\D no digit
\s a word separator (space, tab etc)
\S no word separator
\b finds a word boundary (combination of \s and \S)
\B the opposite of \b
Iterators
Iterators are used for a repetition of the character or expression to the left of the iterator.
? zero or one occurence
* zero or more occurrences
+ one or more occurrences
{n} exactly n occurrences
{n,} at least n occurrences
{n,m} at least n and max. m occurrences
All these operators are "greedy", which means that they take as many characters as they can get. Putting a question mark ? after an operator makes it "non-greedy", i.e. it takes only as many characters as needed.
Example: "b+" applied to the target string "abbbbc" finds "bbbb", "b+?" finds just "b".
Alternatives
Alternatives are put in round brackets, and are separated by a vertical dash.
Example: (John|James|Peter) finds one of the names John, James or Peter.
Look ahead/behind
A condition that text must be followed or preceded by other text, which is then not part of the result.
Negative look ahead: Find a 'q' not followed by an 'u'
q(?!u) matches the 'q' in "Iraq" but not in "question"
Positive look ahead: Find a 'q' followed by an 'u':
q(?=u) matches the 'q' in "question" but not in "Iraq"
Negative look behind: Find an 'u' without a 'q' in front of it:
(?<!q)u matches the 'u' in "Iraq" but not in "question"
Positive look behind: find an 'u' with a 'q' in front of it:
(?<=q)u matches the 'u' in "question" but not in "Iraq"
Important note:
The used regex library has a limitation that look ahead must be at the end of the regular expression, and look behind must be at the start. Otherwise an error will be shown.
Backreferences
\1 Finds what was previously found in the first round brackets ().
\n Finds subexpression n another time in the search result.
Example 1: \s([0-9]+) \1\s matches a number followed by a space and again the same number
Example 2: (.+)\1+ finds e.g. abab (where the first ab is found by .+ and the second by \1+ )
Subroutine calls
(?n) Inserts the sub-expression number n. The difference from backreferences is, that they match a result of an already processed sub-expression, while subroutine call inserts the original sub-expression as is.
Example: if we are trying to match the string "abc_def" using regexp "([a-z]+)_\1", it will not match, because the first part "abc" is not repeated, it should have been "abc_abc". But if we use "([a-z]+)_(?1)", it will effectively turn into "([a-z]+)_([a-z]+)", and will match the string, with the first part "abc" and the second "def".
(?&name) Similar to (?n), but for named sub-expressions instead of the numbered ones.
Subexpressions for search+replace
Text parts in round brackets are taken as subexpressions. Up to 89 subexpressions are supported now.
Example: To swap the title and interpret in the file name of an mp3 file, when they are separated by a dash (Title - Interpret.mp3), use the following options:
Search for: (.*) - (.*)\.mp3
Replace by: $2 - $1.mp3
Here $1 means the text in the first round bracket, and $2 the text in the second round bracket.
In "Replace by", use parameters \U, \L, and \F to convert the placeholder text behind it to uppercase, lowercase, or first char in word uppercase, e.g.
Search for: (.*) - (.*)\.mp3
Replace by: \U$2 - \L$1.mp3
Subexpressions: special groups
(?>bc|c) Atomic grouping: Find bc but not just c. Note: This option is not supported by 'Everything'!
Example: a(?>bc|b)c finds "abcc" (bc before the |) but not "abc" (b after the |)
(?:expr) Non-capturing Groups: Find expression "expr" but does not add the text found in () to the index
Example: (?:.*)_(.*) finds part1_part2 but only adds part2 to the index, accessible via $1
(?P<name>expr) Like a normal subexpression, but stores the result in "name". The <> must be used!
Example: (?P<filename>[a-z]+) puts result in "filename", can be accessed via (?P=filename)
(?'name'expr) Alternative syntax for (?P<name>expr).
(?P=name) Access to the previously defined subexpression with the given name
(?#comment text) A comment, will be ignored by the parser but can be used to explain what a regular expression does.
(?R)? Recursion: finds the same number of the expression on the left as that on the right
Example: a(?R)?z finds e.g. az or aazz or aaaaazzzzz
Modifiers
Modifiers are used for changing behaviour of regular expressions.
(?i) Ignore Upper-/lowercase. In Total Commander, this is the default for file names.
(?-i) Case-sensitive matching.
(?g) Switches on "greedy" mode (active by default)
(?-g) Turns off "greedy" mode, so "+" means the same as "+?"
The other modificators are not relevant for Total Commander, because the program only supports searching within one line.
Some of the above explanations are from the help file for this library.