Program, Process and Threads


The definition of program is linked to the definition of algorithm. An algorithm is a sequence of finite steps which can be executed mechanically to solve a problem. The key points are that algorithms are self-contained; the steps given are sufficient to solve the problem at hand. No additional information is required. Also, the steps are finite; an algorithm must terminate. If it is case of steps going on for ever with some hope of getting the result, it is not an algorithm.

An algorithm coded in a programming language like C is a program. All programs have a starting point, the first instruction to be executed. In C language, a program's execution starts with the main function. There can only be one starting point in a program. That is, there is only one main function in a C Program. A program contains the instructions and data required to solve a problem.


In order to execute a program, the operating system kernel running on a computer has to create a process. A process is an execution environment in a computer system for solving a problem. A process has an id, called process-id. It has other attributes like the parent process id, the process which created this process. Some other attributes are user and group ids of the user who is the owner of the process, the execution times, etc. The Linux command ps gives the attributes of the processes running on the system. Most importantly, a process has an address space in the computer's virtual memory, which has the code segment (also called text segment), containing the program instructions, and the user data segment, containing the heap and the stack. The code and data segments of a process are initialized from the program, which is the only relationship between a process and the program. A process can initialize its code and data segments from different programs at different times and can execute multiple programs during its lifetime. It executes one program at a time and, and mostly, one program in its lifetime. The attributes of a process are kept separately in the system data segment for the process, which is outside the process's address space and, hence can not be modified by the process and can only be modified by the kernel.

On a Linux system, a process is created using the fork system call. A previously running process executes the fork system call and the result is a clone of the process that executed fork. So we have two processes, a parent process that executed the fork system call and a newly born child process that is (almost) a copy of the parent. In order to execute a particular program, the child process executes the exec system call, using the file name of the program as a parameter. exec initializes the process's code and data segments from the given program and the program execution starts. exec is, in fact, a family of six system calls, execl, execv, execlp, execvp, execle and execve. After exec, there is really no relationship between the program and the process. Any number of processes may initialize their code and data segments from the same program using exec and execute the program.


By default, a processes has a single thread of execution. It is possible to have multiple threads of execution in a process. Or, in other words, we can have multiple threads in a process. Each thread has a starting function in the process's code segment. All threads share the process's code and global data. Threads are normally supported by the pthreads library.