Files in Linux

1.0 Files

There are two basic concepts in Linux - processes and files. The processes do things and files keep all the important data. An efficient filesystem is important for an operating system. When Unix was conceived around 1969-70, several design decisions were taken to simplify the filesystem. It was thought that if something was simple it would be efficient and also it would provide a strong foundation for software development and operations.

Another important decision taken in design of Unix was the generalization that anything with which I/O was done was a file. So, regular files, directories, devices, interprocess communication mechanisms like pipes, fifos, sockets all are files. All files can be accessed with a file descriptor in a program. It also led to simplification of commands.

2.0 Disk, Partitions and Filesystems

The hard disk provides the medium on which data in files can be stored. Data in files persists even when the system is switched off. There may be multiple disks and similar secondary storage media. Each disk my have one or more partitions. Each partition can have a filesystem. There is a root filesystem, which is the filesystem for the system. When the system is powered on, it is mounted and is available. The root filesystem has the root directory which is the starting point for traversing the tree structured filesystem. More filesystems can be mounted at directory nodes in the root filesystem. Once mounted, the files in the mounted filesystem appear to be part of the filesystem tree. The filesystem tree looks like this.

Linux Filesystem

3.0 Inodes

Although a filesystem appears to be a tree with files at its nodes and some nodes may be directories, which in turn may be sub-trees, the data structure implemented in a filesystems is not a tree and is quite involved. The most important thing is that a file does not have a name. It is identified by an i-number, which is the index into a table of inodes at the beginning of a filesystem. An inode for a file contains all the control information for the file. It contains, file type, permissions, the owner and group ids for the file, file size, the number of links to the file, the creation and last update and access timestamps, inline file data, and direct and indirect links to the blocks of data contained in the file.

4.0 File Types

There are many kinds of files under Linux. The major file types are regular files, directories, symbolic links, special files, named pipes and Unix domain sockets.

4.1 Regular Files

A file is a sequence of bytes. The operating system does not put any special bytes inside a file. At the time of introduction of Unix, it was customary for computers to have records inside a file. And there would be control data for each record. In Unix, data inside a file is only put in by the concerned programs. The operating system does not write any control data inside a file. As another example, some systems put carriage return (CR) and line feed (LF) control characters between lines so that it can be displayed or printed on devices. Unix just puts an LF character when the user presses ENTER to indicate a new line. When the file is being displayed, the device driver puts in CR before every LF so that the file is displayed or printed correctly. Once again, this conforms to the basic philosophy that file contents should be exactly what the user (or program) put in. If, for display on a device something more is required, it should be done by the device driver at the time of output to the device.

4.2 Directories

It is not necessary to remember inode numbers to access file because of directories. A directory is a special file. It is conceptually a two column table, mapping a file name to an inode number. The combination "filename - inode number" is called a link. These are "hard" links and can only be made to files on the same filesystem. The number of such links is kept in the inode data structure. When the number of links becomes zero and the file is not being opened by any process, it is discarded.

A file appears in at least one directory. Each row in a directory can be for a file or another directory. This leads to a tree-like impression of the filesystem, with the root directory (/), at the top. Also, it means that a single inode can appear with different file names in different directories.

4.3 Symbolic Links

Each filesystem has its own inodes. So a "hard" link to an inode (and the file) can only be made in the filesystem in which that inode is present. To make it possible to link to a file present in another filesystem, symbolic links were introduced. Symbolic links contain the actual file path as the data.

4.4 Special Files

Special files are devices like the hard disk or cdrom. These are mostly present in the /dev directory. There are two types of special files - block devices and character devices. On block devices, data can only be written or read in blocks. There is no such restriction on character devices and even small amount of data can be read or written. Data is cached in buffers in the kernel for block devices. Also, block devices need to be random access. Only filesystems on block devices can be mounted.

4.5 Named Pipes

Fifos, or named pipes, are used for interprocess communication. Fifos behave just like pipes, except that they appear in the file system and can be opened, read and written by a process having the permissions to do so. The standard open, read and write calls for files work on Fifos.

4.6 Unix Domain Sockets

Unix domain sockets are also used for inter process communication. The calls used are the same as that for networking sockets. The domain sockets are fast and read and write calls can be used sending and receiving data.

5.0 Some file handling commands

5.1 pwd

The pwd command prints the current working directory.

$ pwd
/home/user1/src

5.2 ls

The ls command lists files.

$ ls -ls
total 60
 4 drwxr-xr-x 2 user1 user1  4096 Mar 21 20:04 dbus
28 -rw-rw-r-- 1 user1 user1 25642 Apr 15 21:00 shell.c
 4 drwxr-xr-x 5 user1 user1  4096 Feb 29 18:53 socket
 4 drwxr-xr-x 2 user1 user1  4096 Jan 12 12:17 threads
 4 drwxr-xr-x 2 user1 user1  4096 Apr 19 02:17 time
12 -rwxr-xr-x 1 user1 user1  8296 Apr  8 09:05 try
 4 -rw-r--r-- 1 user1 user1   186 Apr  8 09:05 try.c

The -i option prints the inode number for the file.

$ ls -lsi
total 60
 4063680  4 drwxr-xr-x 2 user1 user1  4096 Mar 21 20:04 dbus
52167051 28 -rw-rw-r-- 1 user1 user1 25642 Apr 15 21:00 shell.c
 4063368  4 drwxr-xr-x 5 user1 user1  4096 Feb 29 18:53 socket
52824390  4 drwxr-xr-x 2 user1 user1  4096 Jan 12 12:17 threads
 4195245  4 drwxr-xr-x 2 user1 user1  4096 Apr 19 02:17 time
52824007 12 -rwxr-xr-x 1 user1 user1  8296 Apr  8 09:05 try
52824001  4 -rw-r--r-- 1 user1 user1   186 Apr  8 09:05 try.c

5.3 file

The file command prints the type of a file.

$ file *
acpid.pid:                ASCII text
acpid.socket:             socket
alsa:                     directory
avahi-daemon:             directory
boltd:                    directory
crond.reboot:             empty
initctl:                  symbolic link to /run/systemd/initctl/fifo
initramfs:                directory
lock:                     sticky, directory
log:                      directory
ntpd.pid:                 ASCII text, with no line terminators
sendsigs.omit.d:          directory
snapd-snap.socket:        socket
snapd.socket:             socket
spice-vdagentd:           directory

5.4 du

The du command estimates the disk usage of files and recursively for directories.

$ du -h
56K	./socket/tcp
52K	./socket/udp
68K	./socket/select
180K	./socket
72K	./time
24K	./threads
20K	./dbus
344K	.

5.5 df

The df command tells the free space available on mounted filesystems.

$ df -h
Filesystem      Size  Used Avail Use% Mounted on
udev            3.9G     0  3.9G   0% /dev
tmpfs           784M  2.0M  782M   1% /run
/dev/sda3        92G   16G   71G  19% /
tmpfs           3.9G   80M  3.8G   3% /dev/shm
tmpfs           5.0M  4.0K  5.0M   1% /run/lock
tmpfs           3.9G     0  3.9G   0% /sys/fs/cgroup
/dev/loop3       15M   15M     0 100% /snap/gnome-characters/399
...              ...   ...   ...  ... ....
/dev/sda4       801G   81G  679G  11% /home
tmpfs           784M   20K  784M   1% /run/user/121
tmpfs           784M   40K  784M   1% /run/user/1000
/dev/loop22     291M  291M     0 100% /snap/vlc/1620

5.6 cat

The cat command prints the file, passed as argument, on the terminal.

$ cat hello.c
#include <stdio.h>
#include <string.h>

int main (int argc, char *argv[])
{
    printf ("Hello, World!\n");
}

5.7 hexdump

cat is fine for text files, but if you have a binary data file and are determined to know its contents, you can try hexdump.

$ hexdump -cx hello
0000000 177   E   L   F 002 001 001  \0  \0  \0  \0  \0  \0  \0  \0  \0
0000000    457f    464c    0102    0001    0000    0000    0000    0000
0000010 003  \0   >  \0 001  \0  \0  \0   0 005  \0  \0  \0  \0  \0  \0
0000010    0003    003e    0001    0000    0530    0000    0000    0000
0000020   @  \0  \0  \0  \0  \0  \0  \0   0 031  \0  \0  \0  \0  \0  \0
0000020    0040    0000    0000    0000    1930    0000    0000    0000
0000030  \0  \0  \0  \0   @  \0   8  \0  \t  \0   @  \0 035  \0 034  \0
0000030    0000    0000    0040    0038    0009    0040    001d    001c
0000040 006  \0  \0  \0 004  \0  \0  \0   @  \0  \0  \0  \0  \0  \0  \0
0000040    0006    0000    0004    0000    0040    0000    0000    0000
0000050   @  \0  \0  \0  \0  \0  \0  \0   @  \0  \0  \0  \0  \0  \0  \0
...

The first line shows the contents as characters. The next line shows bytes in hexadecimal. The leftmost column is offset in the file.

5.8 Text Editor

Files are created using a text editor. There are many text editors under Linux, and the one people use is a matter of taste or preference. The popular text editors are Emacs, vi, ed, nano, gedit, etc.