| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
If you are like many computer users, you would frequently like to make
changes in various text files wherever certain patterns appear, or
extract data from parts of certain lines while discarding the rest. To
write a program to do this in a language such as C or Pascal is a
time-consuming inconvenience that may take many lines of code. The job
may be easier with awk.
The awk utility interprets a special-purpose programming language
that makes it possible to handle simple data-reformatting jobs
with just a few lines of code.
The GNU implementation of awk is called gawk; it is fully
upward compatible with the System V Release 4 version of
awk. gawk is also upward compatible with the POSIX
specification of the awk language. This means that all
properly written awk programs should work with gawk.
Thus, we usually don't distinguish between gawk and other awk
implementations.
1.1 Using This Book Using this Info file. Includes sample input files that you can use. 1.2 Typographical Conventions 1.3 Data Files for the Examples Sample data files for use in the awkprograms illustrated in this Info file.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The term awk refers to a particular program, and to the language you
use to tell this program what to do. When we need to be careful, we call
the program "the awk utility" and the language "the awk
language." The term gawk refers to a version of awk developed
as part the GNU project. The purpose of this Info file is to explain
both the awk language and how to run the awk utility.
The main purpose of the Info file is to explain the features
of awk, as defined in the POSIX standard. It does so in the context
of one particular implementation, gawk. While doing so, it will also
attempt to describe important differences between gawk and other
awk implementations. Finally, any gawk features that
are not in the POSIX standard for awk will be noted.
The term awk program refers to a program written by you in
the awk programming language.
See section Getting Started with awk, for the bare
essentials you need to know to start using awk.
Some useful "one-liners" are included to give you a feel for the
awk language (see section Useful One Line Programs).
Many sample awk programs have been provided for you
(see section A Library of awk Functions; also
see section Practical awk Programs).
The entire awk language is summarized for quick reference in
gawk Summary. Look there if you just need
to refresh your memory about a particular feature.
If you find terms that you aren't familiar with, try looking them up in the glossary (see section D. Glossary).
Most of the time complete awk programs are used as examples, but in
some of the more advanced sections, only the part of the awk program
that illustrates the concept being described is shown.
While this Info file is aimed principally at people who have not been
exposed
to awk, there is a lot of information here that even the awk
expert should find useful. In particular, the description of POSIX
awk, and the example programs in
A Library of awk Functions, and
Practical awk Programs,
should be of interest.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Who opened that window shade?!? Count Dracula |
Until the POSIX standard (and The Gawk Manual),
many features of awk were either poorly documented, or not
documented at all. Descriptions of such features
(often called "dark corners") are noted in this Info file with
"(d.c.)".
They also appear in the index under the heading "dark corner."
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
This Info file is written using Texinfo, the GNU documentation formatting language. A single Texinfo source file is used to produce both the printed and on-line versions of the documentation. This section briefly documents the typographical conventions used in Texinfo.
Examples you would type at the command line are preceded by the common shell primary and secondary prompts, `$' and `>'. Output from the command is preceded by the glyph "-|". This typically represents the command's standard output. Error messages, and other output on the command's standard error, are preceded by the glyph "error-->". For example:
$ echo hi on stdout -| hi on stdout $ echo hello on stderr 1>&2 error--> hello on stderr |
Characters that you type at the keyboard look like this. In particular, there are special characters called "control characters." These are characters that you type by holding down both the CONTROL key and another key, at the same time. For example, a Control-d is typed by first pressing and holding the CONTROL key, next pressing the d key, and finally releasing both keys.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Many of the examples in this Info file take their input from two sample data files. The first, called `BBS-list', represents a list of computer bulletin board systems together with information about those systems. The second data file, called `inventory-shipped', contains information about shipments on a monthly basis. In both files, each line is considered to be one record.
In the file `BBS-list', each record contains the name of a computer bulletin board, its phone number, the board's baud rate(s), and a code for the number of hours it is operational. An `A' in the last column means the board operates 24 hours a day. A `B' in the last column means the board operates evening and weekend hours, only. A `C' means the board operates only on weekends.
aardvark 555-5553 1200/300 B alpo-net 555-3412 2400/1200/300 A barfly 555-7685 1200/300 A bites 555-1675 2400/1200/300 A camelot 555-0542 300 C core 555-2912 1200/300 C fooey 555-1234 2400/1200/300 B foot 555-6699 1200/300 B macfoo 555-6480 1200/300 A sdace 555-3430 2400/1200/300 A sabafoo 555-2127 1200/300 C |
The second data file, called `inventory-shipped', represents information about shipments during the year. Each record contains the month of the year, the number of green crates shipped, the number of red boxes shipped, the number of orange bags shipped, and the number of blue packages shipped, respectively. There are 16 entries, covering the 12 months of one year and four months of the next year.
Jan 13 25 15 115 Feb 15 32 24 226 Mar 15 24 34 228 Apr 31 52 63 420 May 16 34 29 208 Jun 31 42 75 492 Jul 24 34 67 436 Aug 15 34 47 316 Sep 13 55 37 277 Oct 29 54 68 525 Nov 20 87 82 577 Dec 17 35 61 401 Jan 21 36 64 620 Feb 26 58 80 652 Mar 24 75 70 495 Apr 21 70 74 514 |
If you are reading this in GNU Emacs using Info, you can copy the regions
of text showing these sample files into your own test files. This way you
can try out the examples shown in the remainder of this document. You do
this by using the command M-x write-region to copy text from the Info
file into a file for use with awk
(See section `Miscellaneous File Operations' in GNU Emacs Manual,
for more information). Using this information, create your own
`BBS-list' and `inventory-shipped' files, and practice what you
learn in this Info file.
If you are using the stand-alone version of Info,
see Extracting Programs from Texinfo Source Files,
for an awk program that will extract these data files from
`gawk.texi', the Texinfo source file for this Info file.
| [ << ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |