7.2 Text Processing Commands
awk searches its input for patterns and performs the specified operation on each line, or fields of the line, that contain those patterns. You can specify the pattern matching statements for awk either on the command line, or by putting them in a file and using the -f program_file option.
Syntax
awk program [file]
where program is composed of one or more:
pattern { action }
fields. Each input line is checked for a pattern match with the indicated action being taken on a match. This continues through the full sequence of patterns, then the next line of input is checked.
Input is divided into records and fields. The default record separator is <newline>, and the variable NR keeps the record count. The default field separator is whitespace, spaces and tabs, and the variable NF keeps the field count. Input field, FS, and record, RS, separators can be set at any time to match any single character. Output field, OFS, and record, ORS, separators can also be changed to any single character, as desired. $n, where n is an integer, is used to represent the nth field of the input record, while $0 represents the entire input record.
BEGIN and END are special patterns matching the beginning of input, before the first field is read, and the end of input, after the last field is read, respectively.
Printing is allowed through the print, and formatted print, printf, statements.
Patterns may be regular expressions, arithmetic relational expressions, string-valued expressions, and boolean combinations of any of these. For the latter the patterns can be combined with the boolean operators below, using parentheses to define the combination:
|| or
&& and
! not
Comma separated patterns define the range for which the pattern is applicable, e.g.:
/first/,/last/
selects all lines starting with the one containing first, and continuing inclusively, through the one containing last.
To select lines 15 through 20 use the pattern range:
NR == 15, NR == 20
Regular expressions must be enclosed with slashes (/) and meta-characters can be escaped with the backslash (\). Regular expressions can be grouped with the operators:
| or, to separate alternatives
+ one or more
? zero or one
A regular expression match can be either of:
~ contains the expression
!~ does not contain the expression
So the program:
$1 ~ /[Ff]rank/
is true if the first field, $1, contains "Frank" or "frank" anywhere within the field. To match a field identical to "Frank" or "frank" use:
$1 ~ /^[Ff]rank$/
Relational expressions
< less than
<= less than or equal to
== equal to
>= greater than or equal to
!= not equal to
> greater than
Offhand you don't know if variables are strings or numbers. If neither operand is known to be numeric, than string comparisons are performed. Otherwise, a numeric comparison is done. In the absence of any information to the contrary, a string comparison is done, so that:
$1 > $2
will compare the string values. To ensure a numerical comparison do something similar to:
( $1 + 0 ) > $2
The mathematical functions: exp, log and sqrt are built-in.
Some other built-in functions include:
index(s,t) returns the position of string s where t first occurs, or 0 if it doesn't
length(s) returns the length of string s
substr(s,m,n) returns the n-character substring of s, beginning at position m
Arrays are declared automatically when they are used, e.g.:
arr[i] = $1
assigns the first field of the current input record to the ith element of the array.
Flow control statements using if-else, while, and for are allowed with C type syntax:
for (i=1; i <= NF; i++) {actions}
while (i<=NF) {actions}
if (i<NF) {actions}
Common Options
-f program_file read the commands from program_file
-Fc use character c as the field separator character
Examples
% cat filex | tr a-z A-Z | awk -F: '{printf ("7R %-6s %-9s %-24s \n",$1,$2,$3)}'>upload.file
cats filex, which is formatted as follows:
nfb791:99999999:smith
7ax791:999999999:jones
8ab792:99999999:chen
8aa791:999999999:mcnulty
changes all lower case characters to upper case with the tr utility, and formats the file into the following which is written into the file upload.file:
7R NFB791 99999999 SMITH
7R 7AX791 999999999 JONES
7R 8AB792 99999999 CHEN
7R 8AA791 999999999 MCNULTY