Regular Expressions
Revision as of 12:35, 16 March 2022 by Xbl (talk | contribs) (Xbl moved page Regex to Regular Expressions without leaving a redirect)
Regular Expressions
| Metacharacter | Description |
|---|---|
\
|
Escape character. |
?
|
Matches the preceding element zero or one time. For example, ab?c matches only "ac" or "abc".
|
+
|
Matches the preceding element one or more times. For example, ab+c matches "abc", "abbc", "abbbc", and so on, but not "ac".
|
|
|
The choice (also known as alternation or set union) operator matches either the expression before or the expression after the operator. For example, abc|def matches "abc" or "def".
|
*
|
Matches the preceding element zero or more times. For example, ab*c matches "ac", "abc", "abbbc", etc. [xyz]* matches "", "x", "y", "z", "zx", "zyx", "xyzzy", and so on. (ab)* matches "", "ab", "abab", "ababab", and so on.
|
^
|
Matches the starting position within the string. In line-based tools, it matches the starting position of any line. |
.
|
Matches any single character (many applications exclude newlines, and exactly which characters are considered newlines is flavor-, character-encoding-, and platform-specific, but it is safe to assume that the line feed character is included). Within POSIX bracket expressions, the dot character matches a literal dot. For example, a.c matches "abc", etc., but [a.c] matches only "a", ".", or "c".
|
[ ]
|
A bracket expression. Matches a single character that is contained within the brackets. For example, [abc] matches "a", "b", or "c". [a-z] specifies a range which matches any lowercase letter from "a" to "z". These forms can be mixed: [abcx-z] matches "a", "b", "c", "x", "y", or "z", as does [a-cx-z].
The |
[^ ]
|
Matches a single character that is not contained within the brackets. For example, [^abc] matches any character other than "a", "b", or "c". [^a-z] matches any single character that is not a lowercase letter from "a" to "z". Likewise, literal characters and ranges can be mixed.
|
$
|
Matches the ending position of the string or the position just before a string-ending newline. In line-based tools, it matches the ending position of any line. |
( )
|
Defines a marked subexpression. The string matched within the parentheses can be recalled later (see the next entry, \n). A marked subexpression is also called a block or capturing group. BRE mode requires \( \).
|
\n
|
Matches what the nth marked subexpression matched, where n is a digit from 1 to 9. This construct is vaguely defined in the POSIX.2 standard. Some tools allow referencing more than nine capturing groups. Also known as a backreference. backreferences are only supported in BRE mode |
{m,n}
|
Matches the preceding element at least m and not more than n times. For example, a{3,5} matches only "aaa", "aaaa", and "aaaaa". This is not found in a few older instances of regexes. BRE mode requires \{m,n\}.
|
Examples:
[hc]?atmatches "at", "hat", and "cat".[hc]*atmatches "at", "hat", "cat", "hhat", "chat", "hcat", "cchchat", and so on.[hc]+atmatches "hat", "cat", "hhat", "chat", "hcat", "cchchat", and so on, but not "at".cat|dogmatches "cat" or "dog"..atmatches any three-character string ending with "at", including "hat", "cat", "bat", "4at", "#at" and " at" (starting with a space).[hc]atmatches "hat" and "cat".[^b]atmatches all strings matched by.atexcept "bat".[^hc]atmatches all strings matched by.atother than "hat" and "cat".^[hc]atmatches "hat" and "cat", but only at the beginning of the string or line.[hc]at$matches "hat" and "cat", but only at the end of the string or line.\[.\]matches any single character surrounded by "[" and "]" since the brackets are escaped, for example: "[a]", "[b]", "[7]", "[@]", "[]]", and "[ ]" (bracket space bracket).s.*matches s followed by zero or more characters, for example: "s", "saw", "seed", "s3w96.7", and "s6#h%(>>>m n mQ".