Pipes in Linux

  • by

1.0 Interprocess communication

A process is an active operating system entity which executes programs. Normally, a process, like a specialist, does one particular job (well). In real life, there are complex workflows and we, often, have multiple processes collaborating to accomplish certain objectives. In order to work together, processes need to exchange data. So we have various interprocess communication (IPC) mechanisms. One of the most fundamental IPC mechanism is the pipe, which symbolizes data flowing sequentially between processes in a pipeline.

2.0 Pipes

Two processes can be joined by the pipe symbol (|) on the shell command line. The standard output of the first process becomes the standard input for the second process. For example,

$ ls -ls | more

The standard output of ls becomes the standard input for more. Individually, both ls and more are oblivious of the fact that the respective standard output or standard input is not to or from the default device but is going to or coming from another process. Conceptually, two processes connected with a pipe look like this,

Both P1 and P2 execute concurrently and P1 passes data to P2 as it executes. The pipe system call returns two file descriptors (int pfd [2]), the one for writing to the pipe (pfd [1]) and the another (pfd [0]) for reading from the pipe. Using the respective file descriptor, one can use the read or write system call for reading from or writing to a pipe just like a file.

3.0 System calls

There are two system calls which are relevant here - the pipe system call and the dup system call.

3.1 pipe system call

The pipe system call is,

int pipe (int pipefd [2]);

After the pipe system call executes, the array pipefd [2] contains two file descriptors, pipefd [0] is for reading from the pipe and pipefd [1] is for writing to the pipe. It is theoretically possible to both read and write from the same pipe end. But, generally, one would either read from or write to a pipe end only and not do both. So, if the pipe is being used for reading, the write file descriptor is closed, and vice-versa.

3.2 dup system calls

dup system calls duplicate file descriptors. You pass a file descriptor and dup finds a file descriptor which is currently closed, makes it open to the same file or pipe) and returns it to the caller.

int dup (int oldfd);
int dup2 (int oldfd, int newfd);

dup returns the lowest numbered file descriptor. So if you close file descriptor 1 (standard output), and assume file descriptor 0 is still open, and have another file descriptor (fd) open to a file and call dup with fd as argument, dup would return 1 and further writes to standard output would result in writing to file originally opened with the fd file descriptor. It is somewhat simpler to use the dup2 system call, which takes oldfd and newfd file descriptors as parameters. dup2 makes newfd a copy of the oldfd, closing newfd if it were already open.

4.0 Example

In this example, we will write code for a process which creates the pipeline,

who | cut -f1  -d' ' | uniq

It is worth noting that any two arbitrary processes cannot communicate using a pipe. The pipe has to be set by the parent process and the children can just use it (often, without knowing about it).

/* 
     pipeline.c : create the pipeline 

                  who | cut -f1 -d' ' | uniq

*/

#include <sys/types.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/wait.h>
#include <error.h>

int main (int argc, char **argv) 
{
    int pfd1 [2], pfd2 [2];
    pid_t pid1, pid2, pid3;

    if (pipe (pfd1) == -1)
       perror ("pipe");
    
    pid1 = fork ();

    if (pid1 == 0) {
        // first child, will become "who"
        if (dup2 (pfd1 [1], STDOUT_FILENO) == -1)
            perror ("dup2");
        if (close (pfd1 [0]) == -1)
            perror ("close");
        if (close (pfd1 [1]) == -1)
            perror ("close");
        execlp ("who", "who", (char *) NULL);
        perror ("execlp");
    }
    
    /* parent process */
    if (pipe (pfd2) == -1)
       perror ("pipe");

    pid2 = fork ();

    if (pid2 == 0) {
        // second child, who will become "cut"
        if (dup2 (pfd1 [0], STDIN_FILENO) == -1)
            perror ("dup2");
        if (dup2 (pfd2 [1], STDOUT_FILENO) == -1)
            perror ("dup2");
        if (close (pfd1 [0]) == -1)
            perror ("close");
        if (close (pfd1 [1]) == -1)
            perror ("close");
        if (close (pfd2 [0]) == -1)
            perror ("close");
        if (close (pfd2 [1]) == -1)
            perror ("close");
        execlp ("cut", "cut", "-f1", "-d ", (char *) NULL);
        perror ("execlp");
    }
    
    /* parent process */
    
    if (close (pfd1 [0]) == -1)
        perror ("close");
    if (close (pfd1 [1]) == -1)
        perror ("close");

    pid3 = fork ();

    if (pid3 == 0) {
        // third child, who will become "uniq"
        if (dup2 (pfd2 [0], STDIN_FILENO) == -1)
            perror ("dup2");
        if (close (pfd2 [0]) == -1)
            perror ("close");
        if (close (pfd2 [1]) == -1)
            perror ("close");
        execlp ("uniq", "uniq", (char *) NULL);
        perror ("execlp");
    }

    /* parent process */

    if (close (pfd2 [0]) == -1)
        perror ("close");
    if (close (pfd2 [1]) == -1)
        perror ("close");

    if (waitpid (pid1, NULL, 0) == -1)
         perror ("waitpid");
 
    if (waitpid (pid2, NULL, 0) == -1)
         perror ("waitpid");
    
    if (waitpid (pid3, NULL, 0) == -1)
         perror ("waitpid");
}

We can compile and run the above program.

$ gcc pipeline.c -o pipeline
$ ./pipeline
alice
bob
carol

The parent process creates a pipe. Now, we must remember that a process's system data comprising of open file descriptors and other items like the current directory, the accumulated CPU time, etc. is inherited by the child process and is preserved across the exec system calls. So, when a parent makes a pipe and forks a child and, then, execs the child program, the child gets the pipe file descriptors. Actually, the parent duplicates the pipe file descriptor to be used by the child from the standard input or output file descriptor and closes the pipe file descriptors. The child reads from the standard input or writes to the standard output as per its program, but actually, courtesy parent, it is reading from or writing to the pipe.

5.0 See also

  1. Interprocess communication using FIFOs in Linux
0 0 votes
Article Rating
Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments

You may like these, also

  • POSIX message queues in LinuxPOSIX message queues in Linux 1.0 POSIX Message queues POSIX interprocess comunication (IPC) was introduced in the POSIX.1b standard (IEEE Std 1003.1b-1993) for real time…
  • POSIX Shared Memory in LinuxPOSIX Shared Memory in Linux 1.0 Shared Memory Shared memory is the fastest method of interprocess communication (IPC) under Linux and other Unix-like systems. The…
  • POSIX Semaphores in LinuxPOSIX Semaphores in Linux 1.0 Semaphores Semaphores are used for process and thread synchronization. Semaphores are clubbed with message queues and shared memory under…
  • fork and exec system calls in Linuxfork and exec system calls in Linux 1.0 fork and exec system calls Suppose we wish to write a "shell program" which would execute another program. Now,…
  • Connecting two computers with Ethernet LAN cableConnecting two computers with Ethernet LAN cable Quite often, we wish to connect two computers back to back using an Ethernet LAN cable. It may be because…
  • D-Bus TutorialD-Bus Tutorial 1.0 D-Bus D-Bus is a mechanism for interprocess communication under Linux and other Unix-like systems. D-Bus has a layered architecture.…
  • Socket programming using the select system callSocket programming using the select system call 1.0 Client-Server Paradigm The Client-Server paradigm divides the software architecture of a system in two parts, the server and its…
  • System V message queues in LinuxSystem V message queues in Linux 1.0 Message queues Message queues are one of the interprocess communication mechanisms available under Linux. Message queues, shared memory and…
  • POSIX Threads Synchronization in CPOSIX Threads Synchronization in C 1.0 POSIX Threads Synchronization POSIX Threads provide multiple flows of execution within a process. The threads have their own stacks…
  • System V Shared Memory in LinuxSystem V Shared Memory in Linux 1.0 Shared Memory Shared memory is one of the three interprocess communication (IPC) mechanisms available under Linux and other Unix-like…