CHAPTER 7 Text Processing
. match any single character except <newline>
* match zero or more instances of the single character (or meta-character) immediately preceding it
[abc] match any of the characters enclosed
[a-d] match any character in the enclosed range
[^exp] match any character not in the following expression
^abc the regular expression must start at the beginning of the line (Anchor)
abc$ the regular expression must end at the end of the line (Anchor)
\ treat the next character literally. This is normally used to escape the meaning of special characters such as "." and "*".
\{n,m\} match the regular expression preceding this a minimum number of n times and a maximum of m times (0 through 255 are allowed for n and m). The \{ and \} sets should be thought of as single operators. In this case the \ preceding the bracket does not escape its special meaning, but rather turns on a new one.
\<abc\> will match the enclosed regular expression as long as it is a separate word. Word boundaries are defined as beginning with a <newline> or anything except a letter, digit or underscore (_) or ending with the same or a end-of-line character. Again the \< and \> sets should be thought of as single operators.
\(abc\) saves the enclosed pattern in a buffer. Up to nine patterns can be saved for each line. You can reference these latter with the \n character set. Again the \( and \) sets should be thought of as single operators.
\n where n is between 1 and 9. This matches the nth expression previously saved for this line. Expressions are numbered starting from the left. The \n should be thought of as a single operator.
& print the previous search pattern (used in the replacement string)
There are a few meta-characters used only by awk and egrep. These are:
+ match one or more of the preceding expression
? match zero or more of the preceding expression
| separator. Match either the preceding or following expression.
( ) group the regular expressions within and apply the match to the set.
Some examples of the more commonly used regular expressions are:
regular
expression matches
cat the string cat
.at any occurrence of a letter, followed by at, such as cat, rat, mat, bat, fat, hat
xy*z any occurrence of an x, followed by zero or more y's, followed by a z.
^cat cat at the beginning of the line
cat$ cat at the end of the line
\* any occurrence of an asterisk
[cC]at cat or Cat
[^a-zA-Z] any occurrence of a non-alphabetic character
[0-9]$ any line ending with a number
[A-Z][A-Z]* one or more upper case letters
[A-Z]* zero or more upper case letters (In other words, anything.)