0

I am writing small IRC Bot, and i need to split incoming messages for easier handling. I wrote a function get_word, which should split string. According to gdb and valgrind, problem is that function sometimes returns invalid pointer, and program fails when trying to free that pointer. Here is the code:

char **get_word(char *str) {
   char **res;
   char *token, *copy;
   int size = 1;
   for(int i = 0; str[i] != '\0'; i++) {
      if(str[i] == ' ') {
         while(str[i] == ' ') {
            i++;
         }
         size++;
      }
   }
   res = malloc((size + 1) * sizeof(char *));
   copy = strdup(str);
   token = strtok(copy, " ");
   for(int i = 0; token != NULL; i++) {
      res[i] = strdup(token);
      token = strtok(NULL, " ");
   }
   free(copy);
   res[size] = NULL;
   return res;
}
5
  • 2
    C has a wonderful feature where you can code /* and some words and then */. If you insert these "comments" into your code at various points it makes it far easier for you to understand what you're doing, and also makes it easier for other people.
    – Hot Licks
    Commented Sep 25, 2011 at 16:04
  • Shouldn't you also be returning the number of words?
    – mpenkov
    Commented Sep 25, 2011 at 16:05
  • 2
    @misha no, because res is NULL terminated
    – Kijan
    Commented Sep 25, 2011 at 16:10
  • Could str ever end with a space?
    – Dmitri
    Commented Sep 25, 2011 at 16:44
  • Please put the valgrind log. It may help us to help you.
    – eyalm
    Commented Sep 25, 2011 at 16:45

4 Answers 4

3

One problem I see is with your nested loops:

Consider this input: ' \0'

The function execution reaches the for loop, i == 0. Then the while loop is also entered. At the end of while loop i == 1. Now the incrementation statement from the for loop is executed and i == 2. So next you will be reading past the end of the string.

EDIT

I understand that size is the number of words found in the input. So I'd go for something like:

for (int i = 0; str[i] != '\0'; ++i) {
    if (str[i] != ' ' && (str[i + 1] == ' ' || str[i + 1] == '\0')) {
         // Counting endings of words
         size++;
    }
}
1
  • So what is your suggestion? I guess i should use something like while(str[i+1] == ' ') insted ?
    – Kijan
    Commented Sep 25, 2011 at 16:11
0
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

char **split (const char *str)
{
char **arr = NULL;
size_t cnt=0;
size_t pos, len;

arr = malloc (sizeof *arr);
for (pos = 0; str[pos]; pos += len) {
    char *dup;
    len = strspn(str+pos, " \t\r\n" );
    if (len) continue;

    len = strcspn(str+pos, " \t\r\n" );
    if (!len) break;

    arr = realloc (arr, (2+cnt) * sizeof *arr);
    dup = malloc (1+len);
    memcpy (dup, str+pos, len);
    dup [len] = 0;
    arr [cnt++] = dup;
    }
arr[cnt] = NULL;
return arr;
}

int main(int argc, char **argv)
{
char **zzz;

for( zzz = split( argv[1] ); *zzz; zzz++) {
    printf( "->%s\n", *zzz );
    }
return 0;
}

The reallocation is a bit clumsy (like in the OP) and improving it is left as an exercise to the reader 8-}

4
  • 1
    Well that's not helpful at all. using realloc() so many times is just too slow. I already wrote something like this some time ago, and it was even shorter, but that's not what I'm trying to achieve.
    – Kijan
    Commented Sep 25, 2011 at 18:59
  • As I said: it is suboptimal (clumsy) but correct. BTW: your "size" variable counts the number of characters, not words. That is also suboptimal :-) Commented Sep 25, 2011 at 19:08
  • No, size does count number of words. It is increased every time when 1 or more adjacent spaces are found. :)
    – Kijan
    Commented Sep 25, 2011 at 19:47
  • Does it handle tabs, too? But: on second thought: yuo're righ. My method could simply be augmented by introducing a preloop to count the number of words (+ the sum of their sizes); just like yours does. But it still is correct; even in its current form. Commented Sep 25, 2011 at 20:54
0

As julkiewicz points out, your nested loops where you count the words could miss the terminating null on str. Additionally, if str ends with spaces, your current code would count an extra word.

You could replace this section:

int size = 1;
for(int i = 0; str[i] != '\0'; i++) {
   if(str[i] == ' ') {
      while(str[i] == ' ') {
         i++;
      }
      size++;
   }
}

..with something like this:

while (*str == ' ') str++;  // skip leading spaces on str
/* count words */
int size = 0;
char *s = str;
do {
  if (*s && *s != ' ') {
    size++;                       // non-space group found
    while (*s && *s != ' ') s++;  // skip to next space
  }
  while (*s == ' ') s++;          // skip spaces after words
} while (*s);

..which counts the starts of groups of non-space characters, rather than groups of spaces, and watches for the terminating null in the inner loops.

You might also consider changing:

for(int i = 0; token != NULL; i++) {

..to:

for(int i = 0; token && i < size; i++) {

..just as a slightly paranoid guard in case strtok finds more words than you counted (though it shouldn't).

-2

I think gdb may be complaining about the fact that you never check the return value of malloc (or strdup) to see if it's non-NULL.

1
  • -1: this doesn't really qualify as a useful answer - gdb doesn't care whether you check return values or not
    – Paul R
    Commented Sep 25, 2011 at 16:23

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Not the answer you're looking for? Browse other questions tagged or ask your own question.