fork and exec system calls in Linux

Process

The process is a fundamental concept in operating systems. The operating system kernel executes a program by creating a process. A process executes a program. A process has three segments, viz. code segment, user data segment and the system data segment. The code segment and user data segment are initialized from the program. The system data segment is managed by the kernel. Once a process starts running, the user data segment is modified by the process as a result of executing the instructions in the code segment. The system call for creating a process is fork and there is the exec family of functions for making a process execute a program. The exec family of functions is built upon the execve system call.

fork system call
execve system call
exec family of functions
Difference between fork and exec
An example

1.0 fork system call

#include <sys/types.h>
#include <unistd.h>

pid_t fork (void);

When a process makes the fork system call, a new process is created which is a clone of the calling process. The code, data and a major part of system data of the new process are copied from the calling process. The newly created process is called the child process, whereas the calling process is termed the parent process. However, there is a difference between the parent and child processes. The return value of fork in the child process is 0, whereas, in the parent process, the process id of the child process is returned. Indeed, the two processes use this difference to figure out whether they are the parent or the child. If fork is unsuccessful, -1 is returned, errno is set appropriately and no child process is created.

What is the use of having a process which is a copy of its parent? Not much. But, then, the fork system call is mostly used in conjunction with a variation of exec in the child process. In Linux, there is an execve system call and there are six functions with names starting with exec and are front-ends to the execve system call. When we say exec in the context of Linux, we mean either the execve system call or one of the six functions described later in this tutorial. After fork, the child process executes one of the variations of the exec functions, or the execve system call to execute a program.

2.0 execve system call

The execve system call (execve(2)) is the starting point of our discussion on exec.

#include <unistd.h>

int execve (const char *filename, char *const argv [],
            char *const envp []);

What happens when a process makes the execve system call? Its code and data segments are initialized from the program contained in the file identified by the filename. The most important thing to note is that it is the same process (its pid is the same as before), but is executing the new program now. argv is an array of arguments to the program, where the zeroth element in the array is the file name of the program itself. envp is an array of environment variables in the format, name = value. The last element in both argv and envp must be NULL. Both argv and envp can be accessed in the main function of the program, which is called as,

int main (int argc, char *argv [], char *envp [])

However, the third parameter, envp is not specified in POSIX.1, which stipulates that the environment variables should be accessed via the external variable environ, (environ (7)).

execve does not return on success. It can't, for the code segment has been initialized from the new program being executed and the return address (in the previous) program is lost forever. However, if execve is unsuccessful, -1 is returned, and errno is set accordingly.

3.0 exec family of functions

There is an exec family of six functions (exec(3)), which provide front-ends to the execve system call. These are,

#include <unistd.h>

extern char **environ;

int execl (const char *path, const char *arg0, const char *arg1, ..., (char *) NULL);
int execlp (const char *file, const char *arg0, const char *arg1, ..., (char *) NULL);
int execle (const char *path, const char *arg0, const char *arg1, ..., (char *) NULL, char *const envp []);
int execv (const char *path, char *const argv[]);
int execvp (const char *file, char *const argv[]);
int execvpe (const char *file, char *const argv[], char *const envp[]);

The names of the first five of above functions are of the form execXY. X is either l or v depending upon whether arguments are given in the list format (arg0, arg1, …, NULL) or arguments are passed in an array (vector). Y is either absent or is either a p or an e. In case Y is p, the PATH environment variable is used to search for the program. If Y is e, then the environment passed in envp array is used. In case of execvpe, X is v and Y is e. The execvpe function is a GNU extension. It is named so as to differentiate it from the execve system call (execve (2)).

4.0 Difference between fork and exec

The major difference is that in case of fork, a new child process is created, which is a clone of the parent process. When a process executes exec, no new process is created. The calling process is overwritten by the program whose filename is passed as the first argument. In most cases, the fork system call is followed by an exec call in the newly created child process. The use case is like this. A process executes the fork system call, which creates a new child process. The child process, then, exec's the program to be executed. So, fork and exec are mostly used together. Without fork, exec is of limited use. And, without exec, fork is hardly of any use.

5.0 An example

As an example, let’s write two programs, parent and child. We will execute parent from the shell’s command line. The parent would fork a child process and the latter would exec the child program in it. The parent would wait for the child to do its work and terminate.

5.1 The parent program

// parent.c: the parent program

#include <stdio.h>
#include <stdlib.h>
#include <sys/types.h>
#include <unistd.h>
#include <string.h>
#include <sys/wait.h>

int main (int argc, char **argv)
{
    int i = 0;
    long sum;
    int pid;
    int status, ret;
    char *myargs [] = { NULL };
    char *myenv [] = { NULL };

    printf ("Parent: Hello, World!\n");

    pid = fork ();

    if (pid == 0) {

        // I am the child

        execve ("child", myargs, myenv);
    }

    // I am the parent

    printf ("Parent: Waiting for Child to complete.\n");

    if ((ret = waitpid (pid, &status, 0)) == -1)
         printf ("parent:error\n");

    if (ret == pid)
        printf ("Parent: Child process waited for.\n");
}

5.2 The child program

//  child.c: the child program

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define A 500
#define B 600 
#define C 700
    
int main (int argc, char **argv)
{   
    int i, j;
    long sum;

    // Some arbitrary work done by the child

    printf ("Child: Hello, World!\n");

    for (j = 0; j < 30; j++ ) {
        for (i =0; i < 900000; i++) {
            sum = A * i + B * i * i + C;
            sum %= 543;
        }
    }

    printf ("Child: Work completed!\n");
    printf ("Child: Bye now.\n");

    exit (0);
}

5.3 Running the parent (and child)

We compile the parent and the child programs. We run the parent program from the command line. The parent forks the child process and executes the child program.

$ # compile parent
$ gcc parent.c -o parent
$ #compile child
$ gcc child.c -o child
$ # run parent (and child)
$ ./parent
Parent: Hello, World!
Parent: Waiting for Child to complete.
Child: Hello, World!
Child: Work completed!
Child: Bye now.
Parent: Child process waited for.
$

We took a rather simple example just to illustrate the concepts of parent and child programs and the respective processes. A more realistic example is the shell. The shell is the parent process. It reads each line of input from the command line, forks a child shell process, which in turn exec’s the command. The shell parent process waits for the child to complete and then prompts for the next command.

Signals in Linux

Program, Process and Threads

Socket Programming using TCP in C

System V Shared Memory in Linux

Git Tutorial