Pipes in Linux

Interprocess communication

A process is an active operating system entity which executes programs. Normally, a process, like a specialist, does one particular job (well). In real life, there are complex workflows and we, often, have multiple processes collaborating to accomplish certain objectives. In order to work together, processes need to exchange data. So we have various interprocess communication (IPC) mechanisms. One of the most fundamental IPC mechanism is the pipe, which symbolizes data flowing sequentially between processes in a pipeline.

Pipes
System calls
- pipe system call
- dup system calls
Example Program
See also

1.0 Pipes

Two processes can be joined by the pipe symbol (|) on the shell command line. The standard output of the first process becomes the standard input for the second process. For example,

$ ls -ls | more

The standard output of ls becomes the standard input for more. Individually, both ls and more are oblivious of the fact that the respective standard output or standard input is not to or from the default device but is going to or coming from another process. Conceptually, two processes connected with a pipe look like this,

Two processes connected with a pipe

title = “Two processes connected with a pipe”
Both P1 and P2 execute concurrently and P1 passes data to P2 as it executes. The pipe system call returns two file descriptors (int pfd [2]), the one for writing to the pipe (pfd [1]) and the another (pfd [0]) for reading from the pipe. Using the respective file descriptor, one can use the read or write system call for reading from or writing to a pipe just like a file.

2.0 System calls

There are two system calls which are relevant here – the pipe system call and the dup system call.

2.1 pipe system call

The pipe system call is,

int pipe (int pipefd [2]);

After the pipe system call executes, the array pipefd [2] contains two file descriptors, pipefd [0] is for reading from the pipe and pipefd [1] is for writing to the pipe. It is theoretically possible to both read and write from the same pipe end. But, generally, one would either read from or write to a pipe end only and not do both. So, if the pipe is being used for reading, the write file descriptor is closed, and vice-versa.

2.2 dup system calls

dup system calls duplicate file descriptors. You pass a file descriptor and dup finds a file descriptor which is currently closed, makes it open to the same file or pipe) and returns it to the caller.

int dup (int oldfd);
int dup2 (int oldfd, int newfd);

dup returns the lowest numbered file descriptor. So if you close file descriptor 1 (standard output), and assume file descriptor 0 is still open, and have another file descriptor (fd) open to a file and call dup with fd as argument, dup would return 1 and further writes to standard output would result in writing to file originally opened with the fd file descriptor. It is somewhat simpler to use the dup2 system call, which takes oldfd and newfd file descriptors as parameters. dup2 makes newfd a copy of the oldfd, closing newfd if it were already open.

3.0 Example Program

In this example, we will write code for a process which creates the pipeline,

who | cut -f1  -d' ' | uniq

It is worth noting that any two arbitrary processes cannot communicate using a pipe. The pipe has to be set by the parent process and the children can just use it (often, without knowing about it).

/* 
     pipeline.c : create the pipeline 

                  who | cut -f1 -d' ' | uniq

*/

#include <sys/types.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/wait.h>
#include <error.h>

int main (int argc, char **argv) 
{
    int pfd1 [2], pfd2 [2];
    pid_t pid1, pid2, pid3;

    if (pipe (pfd1) == -1)
       perror ("pipe");
    
    pid1 = fork ();

    if (pid1 == 0) {
        // first child, will become "who"
        if (dup2 (pfd1 [1], STDOUT_FILENO) == -1)
            perror ("dup2");
        if (close (pfd1 [0]) == -1)
            perror ("close");
        if (close (pfd1 [1]) == -1)
            perror ("close");
        execlp ("who", "who", (char *) NULL);
        perror ("execlp");
    }
    
    /* parent process */
    if (pipe (pfd2) == -1)
       perror ("pipe");

    pid2 = fork ();

    if (pid2 == 0) {
        // second child, who will become "cut"
        if (dup2 (pfd1 [0], STDIN_FILENO) == -1)
            perror ("dup2");
        if (dup2 (pfd2 [1], STDOUT_FILENO) == -1)
            perror ("dup2");
        if (close (pfd1 [0]) == -1)
            perror ("close");
        if (close (pfd1 [1]) == -1)
            perror ("close");
        if (close (pfd2 [0]) == -1)
            perror ("close");
        if (close (pfd2 [1]) == -1)
            perror ("close");
        execlp ("cut", "cut", "-f1", "-d ", (char *) NULL);
        perror ("execlp");
    }
    
    /* parent process */
    
    if (close (pfd1 [0]) == -1)
        perror ("close");
    if (close (pfd1 [1]) == -1)
        perror ("close");

    pid3 = fork ();

    if (pid3 == 0) {
        // third child, who will become "uniq"
        if (dup2 (pfd2 [0], STDIN_FILENO) == -1)
            perror ("dup2");
        if (close (pfd2 [0]) == -1)
            perror ("close");
        if (close (pfd2 [1]) == -1)
            perror ("close");
        execlp ("uniq", "uniq", (char *) NULL);
        perror ("execlp");
    }

    /* parent process */

    if (close (pfd2 [0]) == -1)
        perror ("close");
    if (close (pfd2 [1]) == -1)
        perror ("close");

    if (waitpid (pid1, NULL, 0) == -1)
         perror ("waitpid");
 
    if (waitpid (pid2, NULL, 0) == -1)
         perror ("waitpid");
    
    if (waitpid (pid3, NULL, 0) == -1)
         perror ("waitpid");
}

We can compile and run the above program.

$ gcc pipeline.c -o pipeline
$ ./pipeline
alice
bob
carol

The parent process creates a pipe. Now, we must remember that a process's system data comprising of open file descriptors and other items like the current directory, the accumulated CPU time, etc. is inherited by the child process and is preserved across the exec system calls. So, when a parent makes a pipe and forks a child and, then, execs the child program, the child gets the pipe file descriptors. Actually, the parent duplicates the pipe file descriptor to be used by the child from the standard input or output file descriptor and closes the pipe file descriptors. The child reads from the standard input or writes to the standard output as per its program, but actually, courtesy parent, it is reading from or writing to the pipe.