String library functions in C

  • Post author:
  • Post last modified:April 26, 2024
  • Reading time:12 mins read

String library functions

The string library in C provides functions to operate on strings. There are two kinds of string manipulation functions. Strings are terminated by the null character, and these functions use this characteristic while working on the given string. However, some library functions have a “number” version. They work on initial n characters of the given string, where ‘n’ is an argument to the function. The important string functions are strlen, strcat, strcpy, strcmp, strstr, strtok, and there are memory related functions, like memset and memcpy.

A program using any of the string manipulation functions needs to include the string header file, <string.h>.

1.0 String length – strlen

#include <string.h>

size_t strlen (const char *s);

The strlen function returns the length of the string pointed by s. The length of the string is the number of characters or bytes in the string.

2.0 Concatenate two strings – strcat

#include <string.h>

char *strcat (char *dest, const char *src);

char *strncat (char *dest, const char *src, size_t n);

The strcat function concatenates, i.e., appends, src to the dest string. In the process, strcat removes the terminating null character from the dest string, writes the src string, starting at the byte that previously had the null character, and terminates the resulting string with the a null character. The strncat does the same, except that, if there are more than n characters in the src string, only the first n characters of src are appended to the dest string. After writing characters from the src string, strncat also terminates the dest string with the null character. In both cases, the dest string must have enough space for accommodating the characters of the src string plus the null character. Both functions return a pointer to the resulting dest string.

3.0 String copy – strcpy

#include <string.h>

char *strcpy (char *dest, const char *src);

char *strncpy (char *dest, const char *src, size_t n);

The strcpy function copies the src string to dest. All the characters of src, including the null bytes are copied to dest. The src and dest strings must not overlap and the dest string must have enough space to accommodate a copy of the src string. The strncpy works just like the strcpy function, except that at most n characters are copied. If, the null byte is not there in the n characters of src, the dest string is not null terminated. However, if the length of the src string is less than n bytes, strncpy writes additional null bytes to ensure that a total of n bytes are written. Both strcpy and strncpy return a pointer to the dest string.

4.0 String compare – strcmp

#include <string.h>

int strcmp (const char *s1, const char *s2);

int strncmp (const char *s1, const char *s2, size_t n);

strcmp compares the two strings s1 and s2 and returns an integer value, which is zero if s1 and s2 are equal, a negative value if s1 is less than s2 and a positive value if s1 is greater than s2. The comparison is done as per the ASCII character set and the locale LC_COLLATE setting is not taken into consideration. The strncmp functions works just like strcmp, except that only the first n characters in the two strings are considered for comparison.

5.0 Find substring – strstr

#include <string.h>

char *strstr (const char *haystack, const char *needle);

The strstr function finds the first occurrence of substring needle in the string haystack. While searching for the substring , the null character in substring is not matched. A pointer pointing to the first occurrence of the substring needle in string haystack is returned. If the substring is not found, NULL is returned.

6.0 Get tokens from string – strtok

#include <string.h>

char *strtok (char *str, const char *delim);

The strtok function breaks the string str into tokens, based on the delimiter characters specified in the string delim, and returns a token. The string delim has one or more characters and any of these characters can delimit tokens. The string str is passed as the first parameter in the first call to strtok. Normally, a series of strtok calls are made and each call returns the next token. A token is a null terminated non-empty string. After the first call is made with str as the first parameter, NULL is passed as the first parameter in subsequent calls. The delimiters are ignored at the start and the end of the string and are never a part of the returned token. Multiple delimiters are collapsed into a single delimiter. It is permissible to pass different delimiters in successive calls to strtok. When no token is found, NULL is returned. For example, the program below breaks the string str on the basis of period delimiter to get three tokens.

#include <stdio.h>
#include <string.h>

int main (int argc, char **argv)
{
    char str [] ="Give every man thine ear, but few thy voice. Take each man's censure, but reserve thy judgement. - William Shakespeare";
    const char *delim1 = ".";
    char *token;

    token = strtok (str, delim1);

    while (token) {
        printf ("%s\n", token);
        token = strtok (NULL, delim1);
    }

    return 0;
}

$ make try
gcc     try.c   -o try
$ ./try
Give every man thine ear, but few thy voice
 Take each man's censure, but reserve thy judgement
 - William Shakespeare

It is important to note that the strtok function is not reentrant. That is, it is not thread-safe. If a fresh call to strtok is made before the earlier string is fully processed, the previous context is lost. For example, suppose we wish to further break each token into sub-tokens based on the comma delimiter. We may modify our earlier code as given below, but it does not work.

#include <stdio.h>
#include <string.h>

int main (int argc, char **argv) // Error
{
    char str [] ="Give every man thine ear, but few thy voice. Take each man's censure, but reserve thy judgement. - William Shakespeare";
    const char *delim1 = ".";
    const char *delim2 = ",";
    char *token1, *token2;

    token1 = strtok (str, delim1);

    while (token1) {
        printf ("token1 = %s\n", token1);
        token2 = strtok (token1, delim2);
        while (token2) {
            printf ("token2 = %s\n", token2);
            token2 = strtok (NULL, delim2);
        }
        token1 = strtok (NULL, delim1);
    }
    return 0;
}

$ make try
gcc     try.c   -o try
$ ./try
token1 = Give every man thine ear, but few thy voice
token2 = Give every man thine ear
token2 =  but few thy voice

We get the first token1 delimited by period correctly. But, inside the while loop we make a fresh call to strtok with token1 as the string and comma as the delimiter. We get token2 correctly from token1, but we we are not able to get token1 any more from the original string. We need a reentrant version of strtok, which is strtok_r.

char *strtok_r (char *str, const char *delim, char **saveptr);

strtok_r, has an additional parameter, saveptr. It is a pointer to a character pointer. The address of a character pointer is passed as the third argument to strtok_r. It is used internally by strtok_r to save the context of the string being parsed between successive calls. So, the calling program should not modify saveptr and pass it to strtok_r as it was received from it in the previous call for the string, str.

#include <stdio.h>
#include <string.h>

int main (int argc, char **argv)
{
    char str [] ="Give every man thine ear, but few thy voice. Take each man's censure, but reserve thy judgement. - William Shakespeare";
    const char *delim1 = ".";
    const char *delim2 = ",";
    char *token1, *token2;
    char *sptr1, *sptr2;

    token1 = strtok_r (str, delim1, &sptr1);

    while (token1) {
        printf ("token1 = %s\n", token1);
        token2 = strtok_r (token1, delim2, &sptr2);
        while (token2) {
            printf ("token2 = %s\n", token2);
            token2 = strtok_r (NULL, delim2, &sptr2);
        }
        token1 = strtok_r (NULL, delim1, &sptr1);
    }
    return 0;
}

$ make try
gcc     try.c   -o try
$ ./try
token1 = Give every man thine ear, but few thy voice
token2 = Give every man thine ear
token2 =  but few thy voice
token1 =  Take each man's censure, but reserve thy judgement
token2 =  Take each man's censure
token2 =  but reserve thy judgement
token1 =  - William Shakespeare
token2 =  - William Shakespeare

Both strtok and strtok_r modify the first argument, str. So, these functions cannot be used with constant strings.

7.0 Initialize memory with a constant byte – memset

#include <string.h>

void *memset (void *s, int c, size_t n);

memset is a commonly used library function. It initializes n bytes of memory pointed by s with character c. Most of the time it is used to initialize memory with null characters or string buffers with spaces. memset returns a pointer to the memory modified by it. For example,

memset (s, '\0', m);  // initialize m bytes with null char at s

memset (str, ' ', n);  // initialize n bytes with space char at str

8.0 Copy memory area – memcpy

#include <string.h>

void *memcpy (void *dest, const void *src, size_t n);

memcpy copies n bytes from memory pointed by src to memory pointed by dest. There must not be any overlap between the src and dest memory areas. memcpy returns the pointer to the dest memory area.

9.0 See also

Strings in C

Share

Karunesh Johri

Software developer, working with C and Linux.
0 0 votes
Article Rating
Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments