Softprayog.in

LINUX

THE BEGINNING

If you were a Unix programmer in the late eighties or early nineties, you would have felt the pinch. The pinch was that more and more managements were moving away from Unix as it was costing in about $2000 whereas it was possible to get the alternate popular operating system for about $100. Add another $500 or so for a C Compiler and one could get going with software development. The cost of Unix was about four times more. There was nothing that programmers used to Unix could do as it was for the managements to buy and make the software available to the developers.

There is a saying that you can't stop an idea whose time has come. Or you can't keep something good under the wraps for long. Or the fact that necessity is the mother of invention. The year was 1991 and the date was August 25th. A young Finnish student in Helsinki, Linus Torvalds, working on a terminal emulator program for accessing university Unix servers realized that he, in fact, was working towards a new operating system. Torvalds announced in a Usenet posting to the newsgroup "comp.os.minix.":

Hello everybody out there using minix -

I'm doing a (free) operating system (just a hobby, won't be big and professional like gnu) for 386(486) AT clones. This has been brewing since april, and is starting to get ready. I'd like any feedback on things people like/dislike in minix, as my OS resembles it somewhat (same physical layout of the file-system (due to practical reasons) among other things).

I've currently ported bash(1.08) and gcc(1.40), and things seem to work. This implies that I'll get something practical within a few months, and I'd like to know what features most people would want. Any suggestions are welcome, but I won't promise I'll implement them :-)

Linus (torvalds@kruuna.helsinki.fi)

PS. Yes - it's free of any minix code, and it has a multi-threaded fs. It is NOT portable (uses 386 task switching etc), and it probably never will support anything other than AT-harddisks, as that's all I have :-(.

History was made. The contrast between early nineties when one had to purchase operating system, compilers, windowing system, libraries, etc. and today, when all this comes as free/libre/open source software on CDs and downloadable from Internet, is simply amazing.

THE OPERATING SYSTEM

Linux is a Unix-like operating system, developed independently by Linus Torvalds and thousands of free/libre/open source enthusiasts. It provides the functionality of Unix and for most practical purposes meets the Single Unix Specification of the Open Group, the owners of the UNIX trademark. Linux is a registered trademark of Linus Torvalds.

Linux kernel is a monolithic kernel and provides the basic operating system functions like process management, memory management, file system, I/O management. The associated compilers, shells and other system utilities come from the GNU project. Thus a Linux system is more accurately called GNU/Linux system. The GNU/Linux system provides, in addition to the kernel facilities, networking, security and a graphical user interface. Networking and security are closely coupled with the kernel whereas graphical user interface is managed by the X server process.

SYSTEM CALL INTERFACE

The interface to the Linux kernel is provided by about 1000 system calls, of which about 300 are most widely used. System calls provide the basic kernel facilities; these are the basic building blocks or primitives on which higher level software can be built. A program developed using system calls would be very efficient because it directly interfaces with the kernel. Also a program developed using system calls should be highly portable as the same systems calls are provided on all Unix-like systems. Rochkind's book [5] is the definitive book on kernel system calls and should be read by anyone interested in learning the kernel system call interface.

Linux is a Unix-like operating system. Most of what is said here is true for other Unix-like operating systems also. Similarly a lot of what is said about Unix elsewhere is true for Linux as well.

PROCESS MANAGEMENT

PROCESSES

For a program to be executed on Linux, a process needs to be created. A process, identified by process-id, is an execution environment for executing programs. A process has an address space, which comprises of code segment, user data segment and system data segment. When a process is created its code and user data segments are initialized from the program. The code segment contains the program instructions whereas the data segment comprises of global data. The system data segment comprises of a process descriptor, a kernel data structure keeping data about the process, for example, data like process state, file descriptors, current directory, signals received, etc. The system data segment also comprises of stack, used for passing parameters between functions and return value during execution.

Processes are created by the fork system call. fork creates a clone of the calling process. After fork there are two processes, the calling process, called the parent process and the child process, which is the newly created process and a clone of the parent process. Both have the identical copy of code and data segments, except that fork returns process-id of child process in the parent process whereas the return value of fork in child process is 0. Using the return value, each process can figure out where it is the parent or child. The child process, as is usually coded by the programmer, executes the exec system call, which has the file name of program to be executed as a parameter. exec initializes the code and data segments of the child process from the program to be executed. The old contents (copy of the parent code and data) are overwritten by the code and data of the new program.

THREADS

During the last fifteen years, it has been realized that there were a lot of overheads in process creation and associated context switching during scheduling. There has been a paradigm shift in concurrent programming from processes to threads. Threads are multiple units of parallel execution in the context of a single process. Threads are faster to create, and share all the process resources like files and global data, making the time-taking inter-process communication unnecessary. Today a lot of high performance applications are developed using threads, taking advantage of advances in hardware, specially the multiple CPUs.

SCHEDULER

The Linux kernel provides for scheduling of process in the system for running on the CPU. Till kernel version 2.6.22, Linux had a O(1) scheduler taking the same time for its work regardless of the load. With kernel version 2.6.23, a Completely Fair Scheduler (CFS) has been incorporated, which is a much simpler scheduler. CFS stresses on fairness, providing good results for desktop as well as well as server environments. CFS is a O(log n) scheduler which has been providing good results for high load multi-thread environments.

MEMORY MANAGEMENT

Like all modern day operating systems, Linux provides virtual memory, a technique used to give processes an address space larger than the available physical memory in the system. Each process has its own individual virtual address space. The virtual memory management technique used is Paging. Both the physical address space and the virtual address space are divided into pages of fixed size, say 4 Kbytes. The translation of virtual address to physical address is done by a hardware unit called memory management unit. The memory mangement unit has one or more page tables that translate a given virtual address into the corresponding physical address. Pages which are not being used are swapped to disk. In case a required page is not available in the memory, a page fault interrupt is generated and the required page is loaded into the memory and a corresponding entry is made in the page table. Linux keeps track of page age, which is indicative of the time the given page was last used. It uses Least Recently Used policy to swap old pages to disk.

DISK AND FILE SYSTEM

FILES

There are five types of files under Linux - regular files, directories, symbolic links, device files and pipes. Regular files are the ones we use for our day to day work; it is simply a sequence of bytes. Looking inside a regular file, we find that it may be a text file containing printable characters, with each line ending with newline ('\n') character. In the Windows world, each line ends with carriage return ('\r') and newline character ('\n'). The problem is solved by having two small programs dos2unix and unix2dos. The former reads each line of input file and simply writes output without the '\r' character. unix2dos does just the opposite, reading each line of input file and inserts the '\r' just before '\n'. Other than text file, a file may be a binary file, where each byte can have any value between 0 - 255. It is for the user programs to interpret the structure of binary files.

Directories are affectionately called folders in many graphical user interfaces. In modern day operating systems, the entire directory structure is a tree, where root is the first node and each node is a directory or one of the other four file types mentioned above. Each user has a "home" directory, which is the working directory once the user has logged in. The concept of home directory has got slightly blurred because the graphical user interface displays the "desktop" after login, which is itself a directory under the home directory.

A symbolic link is a file that contains the path to another (target) file. A program writing to the symbolic link "sees" the target file for all practical purposes. In that sense, a symbolic link is just another name for the target file.

An entire hard disk partition is a device file, somewhere under the /dev directory. It is a block device file because data is written in blocks which is generally 4 Kbytes. There is a buffer cache in memory. Blocks are cached in memory and I/O is done on the buffer cache. Blocks are written to disk periodically.

There are two types of pipes. The first type is a simple inter-process communication mechanism that connects standard output of first process to the standard input of second, like

ls -ls | more

For the two processes to communicate using pipes as shown above, there must be a relationship of common ancestor. In this case, shell is the common parent of ls and more. This is to make the pipe file descriptor available to the two processes.

The second type is called a "named" pipe or fifo and shows up in the directory listing as a file. This can be used for inter-process communication between any two processes, not necessarily related. Once created using the mkfifo system call, it can be opened like a file by any process and used for communication with some other process.

FILE SYSTEM

A file system (or filesystem) is a way of organizing files (and its data blocks) in a disk partition. Linux provides a Virtual Filesystem (VFS), which is a kernel layer for providing the same system calls for all filesystems that can be used with Linux. VFS ensures that although the underlying file systems may be as diverse as FAT16 and Plan 9, programs can access them in a common way using the same system calls. The native filesystems supported are ext2, and ext3. Some other filesystems supported are FAT32, NTFS, AIX, OS2, Minix, Solaris, Amoeba, Darwin UFS, FreeBSD, etc.

NETWORKING

Linux takes forward the rich networking legacy of Unix. As the TCP/IP networking protocols were developed on Unix systems, Unix has been a good choice for networking from the point of view of efficiency, scalability, availability and security. Given the efficiency of Linux, it has been the popular choice for network servers on the Internet. Linux provides the complete TCP/IP stack. The LAMP platform comprises of Linux, Apache Web Server, MySQL DBMS and PHP/Perl/Python for web programming. Over the years, the LAMP platform has become very popular.

SECURITY AND RELIABILITY

The security and reliability of Linux stems from the Free/Libre and Open Source Software (FLOSS) philosophy. Since the source code is publicly available, it is seen by programmers around the world and any bugs / malicious code / errors are quickly identified and corrected. Viruses infecting Linux machines is something akin to science fiction; we think it might happen, but it does not happen, at least today and in foreseeable future. It is quite common for Linux systems to have an uptime close to a year.

SHELL

Traditionally, Unix had a command line user interface via a program called shell. Steve Bourne developed the original shell for Unix. Linux has bourne-again shell (bash), which is the most popular shell. There are other shells like csh (C Shell), ksh (Korn Shell) and TCSH. Shell gives complete flexibility in entering user commands. As Linux commands have a myriad of options, it is not possible to give a fine tuned command from a graphical user interface; the number of radio buttons and checkboxes would be one too many. So shell continues to be a great tool. It gives the option of a programmable user interface by which we can combine various commands in a pipeline, apply filters to command outputs and, in general, work fast on the computer. A great way to learn about shell is to read the timeless classic book on the Unix Programming Environment by Kernighan and Pike [8].

GRAPHICAL USER INTERFACE

The success of graphical user interfaces (GUI) stems from the famous saying A picture is worth a thousand words. Indeed, GUI offers a high bandwidth interaction with the computer. Also, it is aesthetically more pleasant to work on a GUI rather than the command line. The foundation of GUI on Unix comes from the X-Window System, or simply X, developed by Bob Scheifler and Jim Gettys at the MIT during the late 1980s. Under X, the client and server concepts are somewhat interchanged. The display is the X server, because multiple client applications display output through the X server. X is a network-transparent windowing system; the client and server may be on different hosts anywhere on the network. The software stack for GUI has X-protocol for communication between client and server, X-lib for programming X-application (client), Xt Intrinsics toolkit providing GUI widgets and finally the high-level GUI toolkit. One of the guiding principles of X were to provide mechanism and not the policy.. The policy was the prerogative of the user. Thus many GUIs with different look-and-feel have come up on X-Window; Motif, GTK+ and Qt are the more popular ones.

REFERENCES:

1. The Linux Foundation, http://www.linux-foundation.org/en/Main_Page

2. The Open Group, http://www.unix.org/what_is_unix/flavors_of_unix.html .

3. Wikipedia, The Free Encyclopedia, Linux http://en.wikipedia.org/wiki/Linux

4. GNU Operating System, http://www.gnu.org

5. Marc J. Rochkind: Advanced UNIX Programming, Second Edition, Addison-Wesley, 2004.

6. M. Tim Jones: Anatomy of the Linux kernel, History and architectural decomposition, http://www.ibm.com/developerworks/linux/library/l-linux-kernel/

7. Avinesh Kumar: Introducing the CFS for Linux, http://www.ibm.com/developerworks/linux/library/l-cfs/

8. Brian W. Kernighan and Rob Pike, The Unix Programming Environment, Prentice Hall, 1984.