Copyright 2001 Bruce Barnett and General Electric Company
All rights reserved
You are allowed to print copies of this tutorial for your personal use, and link to this page, but you are not allowed to make electronic copies, or redistribute this tutorial in any form without permission.
A complete tutorial on find. How to search for a file and do something with it. How to prevent find from searching certain directories.
One of the commands that everyone should master is the find command. The first, and most obvious, use is find's ability to locate old, big, or unused files, or files that you forgot where they are.
The other important characteristic is find's ability to travel down subdirectories. If you wanted a recursive directory list, and ls doesn't have this option, use find.
Normally the shell provides the argument list to a command. That is, Unix programs are frequently given file names and not directory names. Only a few programs can be given a directory name and march down the directory searching for subdirectories. The programs find, tar, du, and diff do this. The commands chmod, chgrp, rm, and cp will, but only if a -r or -R option is specified. In SunOS 4.1, ls supports the -R recursive option.
In general, most commands do not understand directory structures, and rely on the shell to expand meta-characters to directory names. That is, if you wanted to delete all object files ending with a ".o" in a group of directories, you could type
Not only is this tedious to type, it may not find all of the files you are searching for. The shell has certain blind spots. It will not match files in directories starting with a period. And, if any files match */*/*/*.o, they would not be deleted.
Another problem, although one that occurs less frequently in SunOS 4.0 and above, is typing the above, and getting the error "Arguments too long." This means the number of arguments on the list was too large for the shell.
The answer to these problems is using find.
A simple example of find is using it to print the names of all of the files in the directory and all subdirectories. This is done with the simple command
The first argument to find is a directory or file name. The arguments after the path names always start with a minus sign, and tells find what to do once it finds a file. These are the search options. In this case, the file name is printed. Any number of directories can be specified. You can use the tilde character supported by the C shell, as well as particular paths.
which will list every file on the system.
Excuse me. I just spoke an untruth. Every directory always contains the files . and ... Find could report these obvious files, but thankfully doesn't.
Find sends its output to standard output. One way to use this is with the back quote command substitution:
The find command is executed, and its output replaces the back quoted string. Ls sees the output of find, and doesn't even know find was used.
You can look for particular files by using a regular expression with meta-characters as an argument to the -name option. The shell also understands these meta-characters, as it also understands regular expressions. It is necessary to quote these special characters, so they are passed to find unchanged. Either the back quote or single quotes can be used:
The path of the file is not matched with the -name option, merely the name in the directory without the path leading to the file.
If you are only interested in files of a certain type, use the -type argument, followed by one of the following characters:
+---------------------------------------------------+ |Character Meaning | +---------------------------------------------------+ |b Block special file (see mknode(8)) | |c Character special file (see mknode(8)) | |d Directory | |f Plain file | |p Named Pipe File | |l Symbolic link | |s Socket | +---------------------------------------------------+
Unless you are a system administrator, the important types are directories, plain files, or symbolic links (i.e. types d, f, or l).
Using the -type option, another way to recursively list files is:
It can be difficult to keep track of all of the symbolic links in a directory. The next command will find all of the symbolic links in your home directory, and print the files your symbolic links point to.
Find has several options that take a decimal integer. One such argument is -size. The number after this argument is the size of the files in disk blocks. Unfortunately, this is a very vague number. Earlier versions of Unix used disk blocks of 512 bytes. Newer versions allow larger block sizes, so a "block" of 512 bytes is misleading.
This confusion is aggravated when the command ls -s is used. The -s option lists the size of the file in blocks. If the command is "/usr/bin/ls," the block size is 1024 bytes. If the command is "/usr/5bin/ls," the block size is 512 bytes.
Let me confuse you some more. The following shows the two versions of ls:
Can you guess what block size should be specified so that find prints this file? The correct command is:
because the actual size is between 26 and 16 blocks of 512 bytes each. As you can see, "ls -s" is not an accurate number for find. You can put a c after the number, and specify the size in bytes,
To search for files using a range of file sizes, a minus or plus sign can be specified before the number. The minus sign means "less than," and the plus sign means "greater than." This next example lists all files that are greater than 10,000 bytes, but less than 32,000 bytes:
When more than one qualifier is given, both must be true.
An alternate way is to specify a range of times:
Mtime is the last modified time of a file. You can also think of this as the creation time of the file, as Unix does not distinguish between creation and modification. If you want to look for files that have not been used, check the access time with the -atime argument. A command to list all files that have not be read in thirty days or more is
It is difficult to find directories that have not been accessed because the find command modifies the directory's access time.
There is another time associated with each file, called the ctime, accessed with the -ctime option. This will have a more recent value if the owner, group, permission or number of links is changed, while the file itself does not. If you want to search for files with a specific number of links, use the -linksoption.
Find can look for files with a specific permission. It uses an octal number for these permissions. The string rw-rw-r--, indicates you and members of your group have read and write permission, while the world has read only priviledge. The same permissions, when expressed as an octal number, is 664. To find all "*.o" files with the above permission, use:
If you want to see if you have any directories with world write permission, use:
This only matches the exact combination of permissions. If you wanted to find all directories with group write permission, there are several combinations that can match. You could list each combination, but find allows you to specify a pattern that can be bit-wise ANDed with the permissions of the file. Simply put a minus sign before the octal value. The group write permission bit is octal 20, so the following negative value:
will match the following common permissions:
+-------------------------+ |Permission Octal value | +-------------------------+ |rwxrwxrwx 777 | |rwxrwxr-x 775 | |rw-rw-rw- 666 | |rw-rw-r-- 664 | |rw-rw---- 660 | +-------------------------+
If you wanted to look for files that you can execute, (i.e. shell scripts or programs), you want to match the pattern "--x------," by typing:
When the -perm argument has a minus sign, all of the permission bits are examined, including the set user ID bits.
Often you need to look for a file that has certain permissions and belonging to a certain user or group. This is done with the -user and -group search options. To find all files that are set user ID to root, use:
To find all files that are set group ID to staff, use:
Instead of using a name or group in /etc/passwd or /etc/group, you can use a number:
Often, when a user leaves a site, their account is deleted, but their files are still on the computer. A system manager can use the -nouser or -nogroup to find files with an unknown user or group ID.
So far, after find has found a file, all it has done is printed the filename. It can do much more than that, but the syntax can get hairy. Using xargs saves you this mental effort, but it isn't always the best solution.
If you want a recursive listing, find's output can be piped into | xargs ls -l but it is more efficient to use the built in -ls option:
This is similar to the command:
You could also use ls -R command, but that would be too easy.
which do the same as
I have already discussed how to use xargs to execute commands. Find can execute commands directly. The syntax is peculiar, which is one reasons I recommend xargs. The syntax of the -exec option allows you to execute any command, including another find command. If you consider that for a moment, you realize that find needs some way to distinguish the command it's executing from its own arguments. The obvious choice is to use the same end of command character as the shell (i.e. the semicolon). Since the shell normally uses the semicolon itself, it is necessary to "escape" the character with a backslash or quotes. There is one more special argument that find treats differently: {}. These two characters are used as the variable whose name is the file find found. Don't bother re-reading that last line. An example will clarify the usage. The following is a trivial case, and uses the -exec option to mimic the "-print' option.
The C shell uses the characters { and }, but doesn't change {}, which is why it is not necessary to quote these characters. The semicolon must be quoted, however. Quotes can be used instead of a backslash:
as both will pass the semicolon past the shell to the find command. As I said before, find can even call find. If you wanted to list every symbolic link in every directory owned by group "staff" you could execute:
To search for all files with group write permission, and remove the permission, you can use
or
The difference between -exec and xargs are subtle. The first one will execute the program once per file, while xargs can handle several files with each process. However, xargs may have problems with files that contain embedded spaces.
Occasionally people create a strange file that they can't delete. This could be caused by accidentally creating a file with a space or some control character in the name. Find and -exec can delete this file, while xargs could not. In this case, use ls -il to list the files and i-nodes, and use the -inum option with -exec to delete the file:
If you wish, you can use -ok which does the same as -exec, except the program asks you first to confirm the action before executing the command. It is a good idea to be cautious when using find, because the program can make a mistake into a disaster. When in doubt, use echo as the command. Or send the output to a file and examine the file before using the file as input to xargs. This is how I discovered that find can only use one {} in the arguments to -exec. I wanted to rename some files using "-exec mv {} {}.orig" but I learned that I have to write a shell script that I told find to execute.
A better solution is to create a file as the first step in upgrading. I usually call this FirstFile. Find has a -newer option that tests each file and compares the modification date to the newer file. If you then wanted to list all files in /usr that need to be saved when the operating system is upgraded, use:
This could then be used to create a tar or cpio file that would be restored after the upgrade.
The "and" function is performed by the -a option. This is not needed, as two or more options are ANDed automatically. The "or" function is done with the -o option. Parenthesis can be used to add precedence. These must also be escaped. If you wanted to print object and "a.out" files that are older than 7 days, use:
The most painful aspect of a large NFS environment is avoiding the access of files on NFS servers that are down. Find is particularly sensitive to this, because it is very easy to access dozens of machines with a single command. If find tries to explore a file server that happens to be down, it will time out. It is important to understand how to prevent find from going too far.
The important option in this case is -prune. This option confuses people because it is always true. It has a side-effect that is important. If the file being looked at is a directory, it will not travel down the directory. Here is an example that lists all files in a directory but does not look at any files in subdirectories under the top level:
This will print all plain files and prune the search at all directories. To print files except for those in a Source Code Control Directories, use:
If the -o option is excluded, the SCCS directory will be printed along with the other files.
Another useful combination is using -prune with -fstype or -xdev. Fstype tests for file system types, and expects an argument like nfs or 4.2. The later refers to the file system introduced in the 4.2 release of the Berkeley Software Distribution. To limit find to files only on a local disk or disks, use the clause -fstype 4.2 -prune or -o -fstype nfs -prune. If you needed to limit the search to one particular disk partition, use -xdev, The later is very useful if you want to help a congested disk partition, and wanted to look for all files greater than 40 blocks on the current disk partition;
There is a new feature in find that makes a dramatic improvement is searching for files. This typically only works if you are searching for a file on a local disk. The system manager must run a script called updatedb once a night using cron. This creates a database called /usr/lib/find/find.codes.
When find is executed with a single argument, it looks for the file using the database, and reports where the file is. You can create this database yourself, if you wish, as find will look for the database at any location the environment variable FCODES defines.
This document was translated by troff2html v0.21 on September 22, 2001.