[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
This commands operate on individual characters.
9.1 tr
: Translate, squeeze, and/or delete charactersTranslate, squeeze, and/or delete characters. 9.2 expand
: Convert tabs to spacesConvert tabs to spaces. 9.3 unexpand
: Convert spaces to tabsConvert spaces to tabs.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
tr
: Translate, squeeze, and/or delete characters Synopsis:
tr [option]... set1 [set2] |
tr
copies standard input to standard output, performing
one of the following operations:
The set1 and (if given) set2 arguments define ordered
sets of characters, referred to below as set1 and set2. These
sets are the characters of the input that tr
operates on.
The `--complement' (`-c') option replaces set1 with its
complement (all of the characters that are not in set1).
9.1.1 Specifying sets of characters 9.1.2 Translating Changing one characters to another. 9.1.3 Squeezing repeats and deleting 9.1.4 Warning messages
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The format of the set1 and set2 arguments resembles the format of regular expressions; however, they are not regular expressions, only lists of characters. Most characters simply represent themselves in these strings, but the strings can contain the shorthands listed below, for convenience. Some of them can be used only in set1 or set2, as noted below.
A backslash followed by a character not listed below causes an error message.
The notation `m-n' expands to all of the characters
from m through n, in ascending order. m should
collate before n; if it doesn't, an error results. As an example,
`0-9' is the same as `0123456789'. Although GNU tr
does not support the System V syntax that uses square brackets to
enclose ranges, translations specified in that format will still work as
long as the brackets in string1 correspond to identical brackets
in string2.
The notation `[c*n]' in set2 expands to n copies of character c. Thus, `[y*6]' is the same as `yyyyyy'. The notation `[c*]' in string2 expands to as many copies of c as are needed to make set2 as long as set1. If n begins with `0', it is interpreted in octal, otherwise in decimal.
The notation `[:class:]' expands to all of the characters in
the (predefined) class class. The characters expand in no
particular order, except for the upper
and lower
classes,
which expand in ascending order. When the `--delete' (`-d')
and `--squeeze-repeats' (`-s') options are both given, any
character class can be used in set2. Otherwise, only the
character classes lower
and upper
are accepted in
set2, and then only if the corresponding character class
(upper
and lower
, respectively) is specified in the same
relative position in set1. Doing this specifies case conversion.
The class names are given below; an error results when an invalid class
name is given.
alnum
alpha
blank
cntrl
digit
graph
lower
print
punct
space
upper
xdigit
The syntax `[=c=]' expands to all of the characters that are
equivalent to c, in no particular order. Equivalence classes are
a relatively recent invention intended to support non-English alphabets.
But there seems to be no standard way to define them or determine their
contents. Therefore, they are not fully implemented in GNU tr
;
each character's equivalence class consists only of that character,
which is of no particular use.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
tr
performs translation when set1 and set2 are
both given and the `--delete' (`-d') option is not given.
tr
translates each character of its input that is in set1
to the corresponding character in set2. Characters not in
set1 are passed through unchanged. When a character appears more
than once in set1 and the corresponding characters in set2
are not all the same, only the final one is used. For example, these
two commands are equivalent:
tr aaa xyz tr a z |
A common use of tr
is to convert lowercase characters to
uppercase. This can be done in many ways. Here are three of them:
tr abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ tr a-z A-Z tr '[:lower:]' '[:upper:]' |
When tr
is performing translation, set1 and set2
typically have the same length. If set1 is shorter than
set2, the extra characters at the end of set2 are ignored.
On the other hand, making set1 longer than set2 is not
portable; POSIX.2 says that the result is undefined. In this situation,
BSD tr
pads set2 to the length of set1 by repeating
the last character of set2 as many times as necessary. System V
tr
truncates set1 to the length of set2.
By default, GNU tr
handles this case like BSD tr
. When
the `--truncate-set1' (`-t') option is given, GNU tr
handles this case like the System V tr
instead. This option is
ignored for operations other than translation.
Acting like System V tr
in this case breaks the relatively common
BSD idiom:
tr -cs A-Za-z0-9 '\012' |
because it converts only zero bytes (the first element in the complement of set1), rather than all non-alphanumerics, to newlines.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
When given just the `--delete' (`-d') option, tr
removes any input characters that are in set1.
When given just the `--squeeze-repeats' (`-s') option,
tr
replaces each input sequence of a repeated character that
is in set1 with a single occurrence of that character.
When given both `--delete' and `--squeeze-repeats', tr
first performs any deletions using set1, then squeezes repeats
from any remaining characters using set2.
The `--squeeze-repeats' option may also be used when translating,
in which case tr
first performs translation, then squeezes
repeats from any remaining characters using set2.
Here are some examples to illustrate various combinations of options:
tr -d '\000' |
tr -cs 'a-zA-Z0-9' '[\n*]' |
tr -s '\n' |
uniq
with the `-d' option to print out only the words
that were adjacent duplicates.
#!/bin/sh cat "$@" \ | tr -s '[:punct:][:blank:]' '\n' \ | tr '[:upper:]' '[:lower:]' \ | uniq -d |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Setting the environment variable POSIXLY_CORRECT
turns off the
following warning and error messages, for strict compliance with
POSIX.2. Otherwise, the following diagnostics are issued:
tr
by default prints
a usage message and exits, because set2 would not be used.
The POSIX specification says that set2 must be ignored in
this case. Silently ignoring arguments is a bad idea.
GNU tr
does not provide complete BSD or System V compatibility.
For example, it is impossible to disable interpretation of the POSIX
constructs `[:alpha:]', `[=c=]', and `[c*10]'. Also, GNU
tr
does not delete zero bytes automatically, unlike traditional
Unix versions, which provide no way to preserve zero bytes.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
expand
: Convert tabs to spaces
expand
writes the contents of each given file, or standard
input if none are given or for a file of `-', to standard
output, with tab characters converted to the appropriate number of
spaces. Synopsis:
expand [option]... [file]... |
By default, expand
converts all tabs to spaces. It preserves
backspace characters in the output; they decrement the column count for
tab calculations. The default action is equivalent to `-8' (set
tabs every 8 columns).
The program accepts the following options. Also see 2. Common options.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
unexpand
: Convert spaces to tabs
unexpand
writes the contents of each given file, or
standard input if none are given or for a file of `-', to
standard output, with strings of two or more space or tab characters
converted to as many tabs as possible followed by as many spaces as are
needed. Synopsis:
unexpand [option]... [file]... |
By default, unexpand
converts only initial spaces and tabs (those
that precede all non space or tab characters) on each line. It
preserves backspace characters in the output; they decrement the column
count for tab calculations. By default, tabs are set at every 8th
column.
The program accepts the following options. Also see 2. Common options.
[ << ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |