C Programming Tutorial – Getting Started

1.0 Introduction

C is a procedural programming language invented by Dennis Ritchie in 1972. C is, possibly, the most widely used programming language in the last fifty years. There are some unique features of C. It is a small language; so it can be easily learnt. It can be used for almost all kinds of programming tasks. For example, it can be used for controlling the hardware in a embedded system, and for business process modelling and, also, for processing large amounts of data in weather forecasting. C language programs are small, efficient, clear and easy to understand. There are systems with millions of lines of code in C language, which shows that large, robust and reliable systems are made using the C language.

2.0 Hello, World

The first program in C language, and also for other languages, is Hello, World. Keeping with the tradition, the program is,

// hello.c : Hello, World

#include <stdio.h>

int main ()
{
    printf ("Hello, the Beautiful World!\n");
}

The line starting with a double forward slash (//) is a comment. If there is a // on a line, all text to the right of // on the line is considered a comment and is ignored by the compiler. If multi-line comments are required, they can be enclosed between /* and */. The line starting with with #include is a directive to the compiler to include the standard I/O library header file, without which our I/O statements like printf ("...") would not be admissible. Another important thing to note is the line,

int main ()

which starts the main function. A C program is a collection of functions. One of the functions is named main. In a C program, there must be a main function. Apart from the name, a function has a body comprising of statements enclosed in two matching curly brackets or braces. When we execute the program, the first statement of the main function is executed. We can compile and run the program as shown below.

$ gcc hello.c -o hello
$ ./hello
Hello, the Beautiful World!

The -o option tells the compiler to make the final executable file named hello instead of the default, a.out. The commands or the method to compile and execute the code depend upon the compiler and the operating system used and may be different in your case. Here, the gcc compiler under GNU/Linux has been used.

3.0 Constants, Variables and Expressions

Consider the following program to find the circumference and area of a circle.

// circle.c : find circumference and area of circle

#include <stdio.h>

#define PI 3.14159

int main ()
{
    float radius = 2.5;  // cm
    double circumference, area;

    circumference = 2 * PI * radius;

    area = PI * radius * radius;

    printf ("Circumference = %.2f cm  area = %.2f sq cm \n", circumference, area);
}

Let's look at the above program in some detail. The comment and #include lines are similar to the earlier program. The line,

#define PI 3.14159

defines a constant PI with the value 3.14159. This is convenient because once we define a constant with a #define directive, we can use the name PI in our calculations in the rest of the program file. The chances of making an error in typing the value of π are eliminated and also, if we need a more accurate value of π, we need to change only once in the #define for PI.

Next, we have the variables, radius, circumference and area. radius is a single precision floating point variable, whereas circumference and area are double precision. The formulae for circumference and area are expressions, formed by combining variables and constants with operators. Then we have the library function printf for printing the circumference and area. The first parameter to printf is the format specification string for printing. Characters are printed as given in the format string until a % is encountered, which indicates a format specification of the value of a variable is to be printed. The format specification may have a field width, which is not present here. Field width may be followed with a decimal point and an integer, specifying the number of digits to be displayed after the decimal point. The format specification ends with a conversion specifier, which in this case, is f. There is a format specification for each parameter following the format specification string. We can compile and run the above program.

$ gcc circle.c -o circle
$ ./circle
Circumference = 15.71 cm  area = 19.63 sq cm

4.0 Arrays and Loops

An array is collection of items of a type, where each element is identified by its index in the array and also each element can be accessed by adding an offset to the start of the array. The offset is the product of the index of the item in the array and the size of the item in bytes. For example,

int num [10];

num is an array of 10 integers. Array indexes start from 0. So num has 10 integers, viz., num [0], num [1], num [2], etc., the last being num [9].

A loop is a collection of statements, which are executed repeatedly. C has the while and for loops and also the do-while loop. The previous example just calculated the area and circumference for a circle. If we have an array of radii of 10 circles and want to calculate the circumference for each, the program can be modified like this.

// circle.c : find circumference and area of circle (array and for loop)

#include <stdio.h>

#define PI 3.14159
#define ARRAY_SIZE 10 

int main ()
{
    float radius [ARRAY_SIZE] = {2.5, 3.5, 4.5, 5.5, 7.3, 8.9, 9.2, 11.7, 14.1, 17.6};  // cm
    double circumference, area;

    for (int i = 0; i < ARRAY_SIZE; i++) {
        circumference = 2 * PI * radius [i];

        area = PI * radius [i] * radius [i];

        printf ("(%d) Circumference = %.2f cm  area = %.2f sq cm \n", i + 1, circumference, area);
    }
}

In the above example, radius becomes an array of size ARRAY_SIZE, which is #defined as 10. The fact that the array index starts at 0 and goes up to the array size minus 1, is reflected in the for loop,

for (int i = 0; i < ARRAY_SIZE; i++)

which is an idiom for processing an array. There are quite a few idioms like this in C language that need to be mastered and used as a recommended practice. Notice that i is defined as an int inside the for loop and is valid for the extent of the for loop. Also, it is intuitive to start from zero for processing the array, but, in real life we start with 1. So, the printf has the expression i + 1, used for printing the serial number for each line in output. After compilation and executing the program, we get the following results.

$ gcc circle.c -o circle
$ ./circle
(1) Circumference = 15.71 cm  area = 19.63 sq cm 
(2) Circumference = 21.99 cm  area = 38.48 sq cm 
(3) Circumference = 28.27 cm  area = 63.62 sq cm 
(4) Circumference = 34.56 cm  area = 95.03 sq cm 
(5) Circumference = 45.87 cm  area = 167.42 sq cm 
(6) Circumference = 55.92 cm  area = 248.85 sq cm 
(7) Circumference = 57.81 cm  area = 265.90 sq cm 
(8) Circumference = 73.51 cm  area = 430.05 sq cm 
(9) Circumference = 88.59 cm  area = 624.58 sq cm 
(10) Circumference = 110.58 cm  area = 973.14 sq cm 

5.0 Input and Output

The above examples are somewhat contrived as they produce output but do not take any input. Most programs, take some input, process and produce some output. The radius is hard coded and is fixed in above examples and that severely restricts their utility. It is more natural to think of a program where the user inputs the radius and the program computes and prints the circumference and the area. And, the above example can be modified to do this.

// circle.c : find circumference and area of circle

#include <stdio.h>

#define PI 3.14159

int main ()
{
    float radius;  // cm
    double circumference, area;

    printf ("Radius: ");
    while (scanf ("%f", &radius) != EOF) {
        circumference = 2 * PI * radius;

        area = PI * radius * radius;

        printf ("Circumference = %.2f cm  area = %.2f sq cm \n", circumference, area);
        printf ("Radius: ");
    }

    printf ("\n");
}

In the above program, we use scanf for input, which is the converse of printf. Notice the ampersand before radius in the scanf statement, which essentially means the address of radius. This helps scanf access the location of radius and write the value of radius at it. The radius variable need not be an array now, as there is only one radius at any time in the program. Notice the \n, all by itself, in the last printf statement. \n stands for the newline character. When the user presses a Control-D, it signals the end of input, and the while loop terminates. Left, as it were, the program would terminate in the middle of a line. We need two control characters, one for carriage return (CR) to bring the cursor back to the first column, and then, a line feed (LF) to make the cursor go to the next line. The terminology "carriage return" comes from the dot matrix or line printers. Together, these characters are called CR-LF, which is somewhat pronounceable. LF is the newline character, denoted by \n. When the program writes an LF (\n), the kernel magnanimously chips in a CR and the job is done.

When we compile and run the latest circle program, the following output is achieved.

$ gcc circle.c -o circle
$ ./circle
Radius: 2
Circumference = 12.57 cm  area = 12.57 sq cm 
Radius: 3.4
Circumference = 21.36 cm  area = 36.32 sq cm 
Radius: 6.89
Circumference = 43.29 cm  area = 149.14 sq cm 
Radius: 
$ 

6.0 Character Input and Output

We have seen the printf and scanf functions for I/O above. These are for basic types like int, float, double, etc. The basic I/O is for a stream of characters. One reads or write an arbitrary number of characters, which, collectively, can have a higher level of interpretation. Unix and Unix-like systems provide three files to all programs by default, standard input, standard output and standard error. There are two basic functions for character I/O, getchar (), which returns the next character from the standard input file and putchar (), which writes the given character to the standard output file. These functions are,

#include <stdio.h>

int getchar (void);

int putchar (int c);

Both getchar and putchar deal with characters typecast to integers. The return type of getchar is integer because getchar should be able to return end of file (EOF) in addition to characters. putchar also returns EOF on error and needs to have a return type of integer. We can write a file copying program using getchar and putchar. The program given below copies standard input to standard output.

// copy.c: copy standard input to standard output

#include <stdio.h>

int main ()
{
    int c;

    c = getchar ();

    while (c != EOF) {
        putchar (c);

        c = getchar ();
    }
}

The type of c is integer because getchar () returns an integer. As mentioned before, getchar () has to return end of file, EOF also. And, identifier c should be able to store that. We can compile and run the copy program. Since it copies standard input to standard output, we pass file names to it via redirection. And, we use the diff program to check whether the input and output files are the same.

$ gcc copy.c -o copy
$ ./copy <copy.c >copy.x
$ diff copy.c copy.x
$ 

Since diff gives zero output as the differences between the input and the output, we can conclude that the two files are the same.

In C, an assignment is an expression, and, this expression has the value of the value of the identifier on the left of the assignment. So, an assignment statement like,

c = getchar ();

can appear inside an expression, and would be replaced by the value of c inside the expression. So, in the above file copy program, we can move the assignment, c = getchar () to inside the condition checking in the while loop, and remove the two assignments, one before the loop and the other inside. The file copy program becomes,

// copy1.c: copy standard input to standard output

#include <stdio.h>

int main ()
{
    int c;

    while ((c = getchar ()) != EOF)
        putchar (c);
}

It is important to enclose the assignment, c = getchar () inside a pair of parentheses. The precedence of assignment operator, =, is lower than that of the inequality operator, !=, and, if we do not enclose the assignment in parentheses, it would mean, "calculate getchar () != EOF and put its value in c, which is not what we want.

$ gcc copy1.c -o copy1
$ ./copy1 <copy1.c >copy1.x
$ diff copy1.c copy1.x
$ 

The embedding of assignment inside the condition check in while loop is an idiom which occurs very often in C programs.

7.0 Character arrays and Strings

We have seen array of float in an earlier example,

float radius [ARRAY_SIZE] = {2.5, 3.5, 4.5, 5.5, 7.3, 8.9, 9.2, 11.7, 14.1, 17.6};  // cm

Taking a cue, we might think of an array of characters, like

char ch_array [ARRAY_SIZE] = {'a', 'k', 't', 'j', '4', 'o', 'l', 'd', 'c', 'v'}; 

This is not very interesting for a simple reason that each element of the above array does not represent something in real life. In the array of float, as given in an earlier example, each element represented the radius of a circle. In C language, a null character (value 0, represented by '\0') is added as the last element of a character array. The resulting array is called a string and is used as an aggregate entity. A string may be a line of input, a word, a name, etc. For example, "chair" is a string and is represented in memory as,

C string 'chair'
C Language string

8.0 Program structure

A program comprises of global data and a collection of functions. One of the functions is the main function. The execution starts at the main function. A function is defined like this.

<return-type> <function-name> (type arg1, type, ...)
{
    .....
    .....
    .....

    return value;
}

8.1 Scope of variables

The scope of a variable is the extent in which a variable is visible and can be used in a statement. Global variables have scope starting with the declaration and ending at the end of the file. Local or automatic variables are defined inside functions. The scope of these variables is from the point of definition to the end of function.

It is to be noted that the arguments are "Call by Value". The calling function passes zero or more arguments. The called function gets a private copy (via stack) of these arguments. Even if it modifies the arguments, it modifies its local copy. The calling function does not see any modifications in the arguments it passed. This matches to a great extent the philosophy that the functions should be cohesive and have minimum linkages with other parts of the code.

if it is required that a function modify a variable which is defined outside the function, a pointer to that variable can be passed to the function. Using the pointer, the called function can modify the value of that variable. A pointer to a variable is actually the address of that variable.

If a variable is defined in a file, and we wish to access it from some other file, we can declare it with the extern keyword. For example,

// file f1.c
....
int data [5000];
int f1 ()
{
....
....
}
// file fmain.c
....
extern int data [];
....
int main ()
{

}

And, we can compile and run this program, spread over two files, as below

$ gcc -c f1.c
$ gcc -c fmain.c
$ gcc f1.o fmain.o -o fmain
$ ./fmain