LINFO

Pipes: A Brief Introduction



A pipe is a form of redirection that is used in Linux and other Unix-like operating systems to send the output of one program to another program for further processing.

Redirection is the transferring of standard output to some other destination, such as another program, a file or a printer, instead of the display monitor (which is its default destination). Standard output, sometimes abbreviated stdout, is the destination of the output from command line (i.e., all-text mode) programs in Unix-like operating systems.

Pipes are used to create what can be visualized as a pipeline of commands, which is a temporary direct connection between two or more simple programs. This connection makes possible the performance of some highly specialized task that none of the constituent programs could perform by themselves. A command is merely an instruction provided by a user telling a computer to do something, such as launch a program. The command line programs that do the further processing are referred to as filters.

This direct connection between programs allows them to operate simultaneously and permits data to be transferred between them continuously rather than having to pass it through temporary text files or through the display screen and having to wait for one program to be completed before the next program begins.


History

Pipes rank alongside the hierarchical file system and regular expressions as one of the most powerful yet elegant features of Unix-like operating systems. The hierarchical file system is the organization of directories in a tree-like structure which has a single root directory (i.e., a directory that contains all other directories). Regular expressions are a pattern matching system that uses strings (i.e., sequences of characters) constructed according to pre-defined syntax rules to find desired patterns in text.

Pipes were first suggested by M. Doug McIlroy, when he was a department head in the Computing Science Research Center at Bell Labs, the research arm of AT&T (American Telephone and Telegraph Company), the former U.S. telecommunications monopoly. McIlroy had been working on macros since the latter part of the 1950s, and he was a ceaseless advocate of linking macros together as a more efficient alternative to series of discrete commands. A macro is a series of commands (or keyboard and mouse actions) that is performed automatically when a certain command is entered or key(s) pressed.

McIlroy's persistence led Ken Thompson, who developed the original UNIX at Bell Labs in 1969, to rewrite portions of his operating system in 1973 to include pipes. This implementation of pipes was not only extremely useful in itself, but it also made possible a central part of the Unix philosophy, the most basic concept of which is modularity (i.e., a whole that is created from independent, replaceable parts that work together efficiently).


Examples

A pipe is designated in commands by the vertical bar character, which is located on the same key as the backslash on U.S. keyboards. The general syntax for pipes is:

command_1 | command_2 [| command_3 . . . ]

This chain can continue for any number of commands or programs.

A very simple example of the benefits of piping is provided by the dmesg command, which repeats the startup messages that scroll through the console (i.e., the all-text, full-screen display) while Linux is booting (i.e., starting up). dmesg by itself produces far too many lines of output to fit into a single screen; thus, its output scrolls down the screen at high speed and only the final screenful of messages is easily readable. However, by piping the output of dmesg to the filter less, the startup messages can conveniently be viewed one screenful at a time, i.e.,

dmesg | less

less allows the output of dmesg to be moved forward one screenful at a time by pressing the SPACE bar and back one screenful at a time by pressing the b key. The command can be terminated by pressing the q key. (The more command could have been used here instead of less; however, less is newer than more and has additional functions, including the ability to return to previous pages of the output.)

The same result could be achieved by first redirecting the output of dmesg to a temporary file and then displaying the contents of that file on the monitor. For example, the following set of two commands uses the output redirection operator (designated by a rightward facing angle bracket) to first send the output of dmesg to a text file called tempfile1 (which will be created by the output redirection operator if it does not already exist), and then it uses another output redirection operator to transfer the output of tempfile1 to the display screen:

dmesg > tempfile1
tempfile1 > less

However, redirection to a file as an intermediate step is clearly less efficient, both because two separate commands are required and because the second command must await the completion of the first command before it can begin.

The use of two pipes to chain three commands together could make the above example even more convenient for some situations. For example, the output of dmesg could first be piped to the sort filter to arrange it into alphabetic order before piping it to less:

dmesg | sort -f | less

The -f option tells sort to disregard case (i.e., whether letters are lower case or upper case) while sorting.

Likewise, the output of the ls command (which is used to list the contents of a directory) is commonly piped to the the less (or more) command to make the output easier to read, i.e.,

ls -al | less

or

ls -al | more

ls reports the contents of the current directory (i.e., the directory in which the user is currently working) in the absence of any arguments (i.e., input data in the form of the names of files or directories). The -l option tells ls to provide detailed information about each item, and the -a option tells ls to include all files, including hidden files (i.e., files that are normally not visible to users). Because ls returns its output in alphabetic order by default, it is not necessary to pipe its output to the sort command (unless it is desired to perform a different type of sorting, such as reverse sorting, in which case sort's -r option would be used).

This could just as easily be done for any other directory. For example, the following would list the contents of the /bin directory (which contains user commands) in a convenient paged format:

ls -al /bin | less

The following example employs a pipe to combine the ls and the wc (i.e., word count) commands in order to show how many filesystem objects (i.e., files, directories and links) are in the current directory:

ls | wc -l

ls lists each object, one per line, and this list is then piped to wc, which, when used with its -l option, counts the number of lines and writes the result to standard output (which, as usual, is by default the display screen).

The output from a pipeline of commands can be just as easily redirected to a file (where it is written to that file) or a printer (where it is printed on paper). In the case of the above example, the output could be redirected to a file named, for instance, count.txt:

ls | wc -l > count.txt

The output redirection operator will create count.txt if it does not exist or overwrite it if it already exists. (The file does not, of course, require the .txt extension, and it could have just as easily been named count, lines or anything else.)

The following is a slightly more complex example of combining a pipe with redirection to a file:

echo -e "orange \npeach \ncherry" | sort > fruit

The echo command tells the computer to send the text that follows it to standard output, and its -e option tells the computer to interpret each \n as the newline symbol (which is used to start a new line in the output). The pipe redirects the output from echo -e to the sort command, which arranges it alphabetically, after which it is redirected by the output redirection operator to the file fruit.

As a final example, and to further illustrate the great power and flexibility that pipes can provide, the following uses three pipes to search the contents of all of the files in current directory and display the total number of lines in them that contain the string Linux but not the string UNIX:

cat * | grep "Linux" | grep -v "UNIX" | wc -l

In the first of the four segments of this pipeline, the cat command, which is used to read and concatenate (i.e., string together) the contents of files, concatenates the contents of all of the files in the current directory. The asterisk is a wildcard that represents all items in a specified directory, and in this case it serves as an argument to cat to represent all objects in the current directory.

The first pipe sends the output of cat to the grep command, which is used to search text. The Linux argument tells grep to return only those lines that contain the string Linux. The second pipe sends these lines to another instance of grep, which, in turn, with its -v option, eliminates those lines that contain the string UNIX. Finally, the third pipe sends this output to wc -l, which counts the number of lines and writes the result to the display screen.


"Fake Pipes"

A notation similar to the pipes of Unix-like operating systems is used in Microsoft's MS-DOS operating system. However, the method of implementation is completely different. Sometimes the pipe-like mechanism used in MS-DOS is referred to as fake pipes because, instead of running two or more programs simultaneously and channeling the output data from one continuously to the next, MS-DOS uses a temporary buffer file (i.e., section of memory) that first accumulates the entire output from the first program and only then feeds its contents to the next program.

This more closely resembles redirection through a file than it does the Unix concept of pipes. It takes more time because the second program cannot begin until the first has been completed, and it also consumes more system resources (i.e., memory and processor time). This approach could be particularly disadvantageous if the first command produces a very large amount of output and/or does not terminate.






Created April 29, 2004. Last updated August 23, 2006.
Copyright © 2004 - 2006 The Linux Information Project. All Rights Reserved.