Strings in C

  • Post author:
  • Post last modified:February 27, 2024
  • Reading time:9 mins read

Strings

A string is a sequence of characters. A string in the C language has the null character, ‘\0’, at the end. A string with zero characters has just the null character, ‘\0’.

There are two kinds of strings in C. The first is the variable string. The contents of a variable string can change during the execution of the program. The second is the constant string, which stays constant and does not change during the execution of the program.

1.0 Variable Strings

A variable string is basically a character array. We can define a variable string as,

#define SIZE 10
char mystring [SIZE];

This defines a character array mystring, which can store a string. The only requirement is that mystring should have the null character, ‘\0’, somewhere in it. The string starts at the index 0 and extends till the presence of the first null character. A function, processing a string like mystring, does not bother about the array size, SIZE; it keeps on processing till it encounters the null character. In fact, in most cases, a function processing a string does not know the size of the array holding the string. Its brief is just to go on processing till it encounters the terminating null character.

The length of a string is the number of characters before the terminating null character, ‘\0’. So the size of the array containing a string must be the length of largest string that would be stored in it plus one byte for the null character. So, in the above example, the array mystring can store strings of lengths between 0 and 9 characters. Trying to store a string of size greater than the underlying array size minus one is a serious bug and the program code must take adequate care to prevent it.

There are multiple ways to initialize a variable string. For example,

#include <stdio.h>
#include <string.h>

int main (int argc, char *argv [])
{
    #define SIZE 10
    char mystring [SIZE] = "Hello";
    char mystring1 [4] = {'H', 'E', 'L', 'O'}; // Error
    char mystring1a [5] = {'H', 'E', 'L', 'O', '\0'}; // Corrected
    char mystring2 [] = {'H', 'E', 'L', 'O'}; // Error
    char mystring2a [] = {'H', 'E', 'L', 'O', '\0'}; // Corrected
    char mystring3 [] = "Hello World!";
    char buffer [100];

    printf ("mystring = %s strlen (mystring) = %ld sizeof (mystring) = %ld\n", mystring, strlen (mystring), sizeof (mystring));
    printf ("mystring1 = %s, strlen (mystring1) = %ld sizeof (mystring1) = %ld (Error)\n", mystring1, strlen (mystring1), sizeof (mystring1));
    printf ("mystring1a = %s, strlen (mystring1a) = %ld sizeof (mystring1a) = %ld (Corrected)\n", mystring1a, strlen (mystring1a), sizeof (mystring1a));
    printf ("mystring2 = %s, strlen (mystring2) = %ld sizeof (mystring2) = %ld (Error)\n", mystring2, strlen (mystring2), sizeof (mystring2));
    printf ("mystring2a = %s, strlen (mystring2a) = %ld sizeof (mystring2a) = %ld (Corrected)\n", mystring2a, strlen (mystring2a), sizeof (mystring2a));
    printf ("mystring3 = %s, strlen (mystring3) = %ld sizeof (mystring3) = %ld\n", mystring3, strlen (mystring3), sizeof (mystring3));
    strcpy (buffer, "All is well!");
    printf ("buffer = %s strlen (buffer) = %ld sizeof (buffer) = %ld\n", buffer, strlen (buffer), sizeof (buffer));
    sprintf (buffer, "mystring = %s", mystring);
    printf ("buffer = \"%s\" strlen (buffer) = %ld sizeof (buffer) = %ld\n", buffer, strlen (buffer), sizeof (buffer));

    return 0;
}

We can compile and run the above program as,

$ make try1
cc -c   try1.c -o try1.o
cc   try1.o   -o try1
$ ./try1
mystring = Hello strlen (mystring) = 5 sizeof (mystring) = 10
mystring1 = HELOHELOHELO, strlen (mystring1) = 12 sizeof (mystring1) = 4 (Error)
mystring1a = HELO, strlen (mystring1a) = 4 sizeof (mystring1a) = 5 (Corrected)
mystring2 = HELOHELO, strlen (mystring2) = 8 sizeof (mystring2) = 4 (Error)
mystring2a = HELO, strlen (mystring2a) = 4 sizeof (mystring2a) = 5 (Corrected)
mystring3 = Hello World!, strlen (mystring3) = 12 sizeof (mystring3) = 13
buffer = All is well! strlen (buffer) = 12 sizeof (buffer) = 100
buffer = "mystring = Hello" strlen (buffer) = 16 sizeof (buffer) = 100

The library function strlen gives the length of the argument string. There is an error in the definitions of mystring1 and mystring2 which is corrected in the corresponding definitions of mystring1a and mystring2a respectively. The cardinal rule is that the size of array must be larger than that of the string so that there is place for the terminating null character. The definitions of mystring and mystring3 are correct and are preferred ways of initializing strings. The use of a large general buffer with strcpy and sprintf library functions is also quite common.

2.0 Creating strings using dynamic memory allocation

Quite often, the size of the string is known only at the runtime. For example, suppose we are reading data from an input file and need to make a string of the data read. In such cases, it is common to dynamically allocate memory for the string and copy the string in it. For example, the program below reads a line from the standard input, finds the length of the string read, allocates memory using malloc and copies the string in it. When the string is no more required, the allocated memory is freed.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

char buf [100];
int main (int argc, char *argv [])
{
    char *ptr;

    while (fgets (buf, sizeof (buf), stdin) != NULL) {
        // remove newline from buf
        int len = strlen (buf);
        if (buf [len - 1] == '\n')
            buf [len - 1] = '\0';

        // Create string
        // allocate space, +1 for the null char
        if ((ptr = malloc (strlen (buf) + 1)) == NULL) {
            perror ("malloc"); exit (1);
        }
        strcpy (ptr, buf);

        printf ("buf = \"%s\", ptr = \"%s\"\n", buf, ptr);

        free (ptr);
    }
}

3.0 Constant Strings

A constant string, or string literal, is a string that is constant and does not change during the program execution. For example,

char *cstring = "This is a constant string";

Here, cstring is a pointer to the type char and this pointer points to the constant string, “This is a constant string”. A constant string may not be modified. If it is modified, the results are undefined. We can prevent modification of the string by using the keyword const.

const char * cstring = "This is a constant string";

Now the string is designated as constant and a statement like cstring [0] = ‘K’ generates a compile-time error. It is important to note that the string is made constant but the pointer, cstring is still a variable and can be made to point to some other string. If that is done, the original string, “This is a constant string”, is lost and we cannot access it any more in the program. To prevent that, we need to declare the pointer cstring as a constant as well. That is,

const char * const cstring = "This is a constant string";

Now the pointer cstring is termed a constant and any attempt to change it will result in a compile-time error.

4.0 Example: stringcp function to copy strings

We have an example function, stringcp, which illustrates various string concepts. The concept of string is based on arrays, pointers and the sentinel null character at the end of string. The function stringcp copies the source string to destination. It is like the strcpy library function. First, we have the array version which uses the concept of the equivalence of pointers and arrays.

#include <stdio.h>
#include <string.h>

// stringcp: copy src to dest, array version
void stringcp (char *dest, const char *src)
{
    int i = 0;

    while ((dest [i] = src [i]) != '\0') {
        i++;
    }
}

char buf1 [100];
char buf2 [100];
int main (int argc, char *argv [])
{
    while (fgets (buf1, sizeof (buf1), stdin) != NULL) {
        // remove newline from buf1
        int len = strlen (buf1);
        if (buf1 [len -1 ] == '\n')
            buf1 [len - 1] = '\0';
        stringcp (buf2, buf1);
        printf ("source = \"%s\", dest = \"%s\"\n", buf1, buf2);
    }
}

Next we have the pointer version.

// stringcp: copy src to dest, pointer version 1
void stringcp (char *dest, const char *src)
{
    while ((*dest = *src ) != '\0') {
        src++; dest++;
    }
}

We can make the stringcp a little shorter by using the postfix increment operator for pointers in the condition of the while loop.

void stringcp (char *dest, const char *src)
{
    while ((*dest++ = *src++) != '\0')
        ;
} 

The expression in parenthesis evaluates to a non-zero value for all characters except the last character. So we can skip the comparison to ‘\0’ in the condition of the while loop. The final version of stringcp is as given below

void stringcp (char *dest, const char *src)
{
    while (*dest++ = *src++)
        ;
}

5.0 See also

Share

Karunesh Johri

Software developer, working with C and Linux.