bash idioms are tiny scripts, mostly one-liners, that accomplish a lot and can be used as building blocks in bigger scripts.
1.0 Find most frequent words
Suppose we have a bunch of text files and we wish to find the most frequently used words in those files, we can do that with the command,
cat * | tr -sc '[:graph:]' '\n' | sort | uniq -c | sort -nr
First, we capture all input with cat and pipe it to the tr command. tr translates the complement of graphic (printable) characters, that is whitespace, into newlines, squeezing multiple whitespace characters into one. Then, it sorts the input. This puts words in sorted order, one word per line. Duplicate words, if any, are put on consecutive lines. With the uniq -c command, we replace duplicates with a single line containing the count and the word. Finally, we sort the file numerically in the reverse order to get the most frequent words at the top. For example, if the above script is put in a file named freq, and the file is made executable, we can find the ten most frequent words with the command,
$ ./freq info.txt | head 259 the 126 and 109 of 106 to 96 a 73 in 67 software 63 is 45 be 41 The
2.0 Copy files and directories recursively
It turns out that rsync is the best copying command around. Mostly, we wish to copy files. When we copy files, we, mostly, want to preserve the file attributes. If a file is a directory, we want it to be copied to the target recursively. And sometimes we wish to skip some files or directories during the copy process. rsync provides all these facilities.
$ # copy file to current directory. $ sudo rsync -avz ~/www/index.php . sending incremental file list sent 59 bytes received 12 bytes 142.00 bytes/sec total size is 529 speedup is 7.45 $ $ # copy directory recursively to current directory. $ sudo rsync -avz ~/www/sites . sending incremental file list sites/ sites/example.sites.php sites/all/ sites/all/modules/ sites/all/modules/admin_menu/ ... $ $ # copy sites to current directory recursively but skip the $ # "all" and "default" sub-directories $ sudo rsync -avz --exclude all --exclude default ~/www/sites . sending incremental file list sites/ sites/example.sites.php sent 1,102 bytes received 39 bytes 2,282.00 bytes/sec total size is 2,365 speedup is 2.07
3.0 List files with names sorted numerically
If the version number is embedded in the file name, ls does not list those files in the correct numerical order. Using the -v option, we get the correct file order in the ls output.
$ ls syslog* syslog syslog.10.gz syslog.20.gz syslog.2.gz syslog.3.gz syslog.5.gz syslog.7.gz syslog.9.gz syslog.1 syslog.11.gz syslog.24.gz syslog.30.gz syslog.4.gz syslog.6.gz syslog.8.gz $ ls -v syslog* syslog syslog.2.gz syslog.4.gz syslog.6.gz syslog.8.gz syslog.10.gz syslog.20.gz syslog.30.gz syslog.1 syslog.3.gz syslog.5.gz syslog.7.gz syslog.9.gz syslog.11.gz syslog.24.gz
The same result is obtained by passing the ls output through sort and using
. as the field separator and sorting numerically based on the second key.
$ ls syslog* | sort -t . -n -k2,2 syslog syslog.1 syslog.2.gz syslog.3.gz syslog.4.gz syslog.5.gz syslog.6.gz syslog.7.gz syslog.8.gz syslog.9.gz syslog.10.gz syslog.11.gz syslog.20.gz syslog.24.gz syslog.30.gz
The same result is achieved by the ls -v command.
$ ls -v syslog* | more syslog syslog.1 syslog.2.gz syslog.3.gz syslog.4.gz syslog.5.gz syslog.6.gz syslog.7.gz syslog.8.gz syslog.9.gz syslog.10.gz syslog.11.gz syslog.20.gz syslog.24.gz syslog.30.gz
4.0 Find files based on matching patterns in contents
Consider the case where the find command gives a list of files and we wish to grep for a pattern in those files. This is easily accomplished by the xargs command, which is used for building command line from its standard input. For example,
$ find . -name '*.c' | xargs grep 'fread' ./alt/texttags.c: fread( &lenght, sizeof( gsize ), 1, input ); ./alt/texttags.c: fread( data, sizeof( guint8 ), lenght, input ); ./save.c: fread( &lenght, sizeof( gsize ), 1, input ); ./save.c: fread( data, sizeof( guint8 ), lenght, input );
5.0 List all sub-directories under a directory
ls -al lists all the files and sub-directories under a directory. But what about the case when you just want the sub-directory listing? The answer is to pipe the ls output to grep, selecting all lines starting with a d.
$ ls -al | grep '^d' drwxrwxr-x 5 user1 user1 4096 Apr 1 07:11 . drwxr-xr-x 65 user1 user1 4096 Apr 1 07:06 .. drwxrwxr-x 8 user1 user1 4096 Mar 31 19:43 HelloWorld drwxrwxr-x 2 user1 user1 4096 Apr 1 07:10 new drwxrwxr-x 2 user1 user1 4096 Apr 1 07:11 tmp