Regular Expressions

From Leo's Notes
Last edited on 4 January 2022, at 19:40.

A cheat sheet with regular expressions.

Special characters and their meanings[edit | edit source]

Character Meaning Example
* Match zero, one or more of the previous Ca* matches C or Ca or Caaaa
? Match zero or one of the previous Ca? matches C or Ca, but not Caaa
+ Match one or more of the previous Ca* matches Ca or Caaaa, but not C
\ Used to escape a special characters \ ? ( ) [ ] | C:\\ matches C:\
. Wildcard character, matches any character Ca.* matches Canada, Can't, Caught, etc.
( ) Group characters See example for |. Can also be used for replacing text with \1, \2, etc.
[ ] Matches a range of characters [cbf]ar matches car, bar, far.

[0-9]+ matches any integer. [a-zA-Z] matches ASCII letters a-z (upper and lower case) [^0-9] matches any character not 0-9

| Matches previous OR next character/group (Mon|Tues)day matches "Monday" or "Tuesday"
{ } Matches a specified number of occurrences of the previous [a-z]{3} matches any 3 character string using characters a-z. Eg. abc, xyz.

[0-9]{2,4} matches any 2 to 4 digit number. Eg. 23, 349, 5934. [a-z0-9]{2,} matches any alpha-numeric lowercase string longer than 2 characters. Eg. a9, c1cd...

^ Beginning of a string or negation in a range. ^http:\/\/ matches anything starting with http://

[^0-9] matches any character not 0-9.

$ End of a string. \.exe$ matches anything ending with .exe

POSIX character classes[edit | edit source]

Character Class Meaning
[:alpha:] Any letter, [A-Za-z]
[:upper:] Any uppercase letter, [A-Z]
[:lower:] Any lowercase letter, [a-z]
[:digit:] Any digit, [0-9]
[:alnum:] Any alphanumeric character, [A-Za-z0-9]
[:xdigit:] Any hexadecimal digit, [0-9A-Fa-f]
[:space:] A tab, new line, vertical tab, form feed, carriage return, or space
[:blank:] A space or a tab.
[:print:] Any printable character
[:punct:] Any punctuation character: ! ' # S % & ' ( ) * + , - . / : ; < = > ? @ [ / ] ^ _ { | } ~
[:graph:] Any character defined as a printable character except those defined as part of the space character class
[:word:] Continuous string of alphanumeric characters and underscores.
[:ascii:] ASCII characters, in the range: 0-127
[:cntrl:] Any character not part of the character classes: [:upper:], [:lower:], [:alpha:], [:digit:], [:punct:], [:graph:], [:print:], [:xdigit:]

Other notes[edit | edit source]

Usage with sed[edit | edit source]

When using sed, you need to escape special characters like (), {}, and /. For example:

$ cat file | sed 's/\(9[0-9]\{1\}%\)/<strong style="color:orange">\1<\/strong>/'