C string split problem

Question

I am writing small IRC Bot, and i need to split incoming messages for easier handling. I wrote a function get_word, which should split string. According to gdb and valgrind, problem is that function sometimes returns invalid pointer, and program fails when trying to free that pointer. Here is the code:

char **get_word(char *str) {
   char **res;
   char *token, *copy;
   int size = 1;
   for(int i = 0; str[i] != '\0'; i++) {
      if(str[i] == ' ') {
         while(str[i] == ' ') {
            i++;
         }
         size++;
      }
   }
   res = malloc((size + 1) * sizeof(char *));
   copy = strdup(str);
   token = strtok(copy, " ");
   for(int i = 0; token != NULL; i++) {
      res[i] = strdup(token);
      token = strtok(NULL, " ");
   }
   free(copy);
   res[size] = NULL;
   return res;
}

C has a wonderful feature where you can code /* and some words and then */. If you insert these "comments" into your code at various points it makes it far easier for you to understand what you're doing, and also makes it easier for other people. — Hot Licks, Commented Sep 25, 2011 at 16:04

julx · Accepted Answer · 2011-09-25 16:26:26Z

3

One problem I see is with your nested loops:

Consider this input: ' \0'

The function execution reaches the for loop, i == 0. Then the while loop is also entered. At the end of while loop i == 1. Now the incrementation statement from the for loop is executed and i == 2. So next you will be reading past the end of the string.

EDIT

I understand that size is the number of words found in the input. So I'd go for something like:

for (int i = 0; str[i] != '\0'; ++i) {
    if (str[i] != ' ' && (str[i + 1] == ' ' || str[i + 1] == '\0')) {
         // Counting endings of words
         size++;
    }
}

edited Sep 25, 2011 at 16:26

answered Sep 25, 2011 at 16:03

julx

9,0818 gold badges50 silver badges87 bronze badges

So what is your suggestion? I guess i should use something like while(str[i+1] == ' ') insted ?
– Kijan
Commented Sep 25, 2011 at 16:11

Add a comment |

wildplasser · Accepted Answer · 2011-09-25 17:03:56Z

0

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

char **split (const char *str)
{
char **arr = NULL;
size_t cnt=0;
size_t pos, len;

arr = malloc (sizeof *arr);
for (pos = 0; str[pos]; pos += len) {
    char *dup;
    len = strspn(str+pos, " \t\r\n" );
    if (len) continue;

    len = strcspn(str+pos, " \t\r\n" );
    if (!len) break;

    arr = realloc (arr, (2+cnt) * sizeof *arr);
    dup = malloc (1+len);
    memcpy (dup, str+pos, len);
    dup [len] = 0;
    arr [cnt++] = dup;
    }
arr[cnt] = NULL;
return arr;
}

int main(int argc, char **argv)
{
char **zzz;

for( zzz = split( argv[1] ); *zzz; zzz++) {
    printf( "->%s\n", *zzz );
    }
return 0;
}

The reallocation is a bit clumsy (like in the OP) and improving it is left as an exercise to the reader 8-}

answered Sep 25, 2011 at 17:03

wildplasser

44.2k9 gold badges71 silver badges116 bronze badges

1

Well that's not helpful at all. using realloc() so many times is just too slow. I already wrote something like this some time ago, and it was even shorter, but that's not what I'm trying to achieve.
– Kijan
Commented Sep 25, 2011 at 18:59
As I said: it is suboptimal (clumsy) but correct. BTW: your "size" variable counts the number of characters, not words. That is also suboptimal :-)
– wildplasser
Commented Sep 25, 2011 at 19:08
No, size does count number of words. It is increased every time when 1 or more adjacent spaces are found. :)
– Kijan
Commented Sep 25, 2011 at 19:47
Does it handle tabs, too? But: on second thought: yuo're righ. My method could simply be augmented by introducing a preloop to count the number of words (+ the sum of their sizes); just like yours does. But it still is correct; even in its current form.
– wildplasser
Commented Sep 25, 2011 at 20:54

Add a comment |

Dmitri · Accepted Answer · 2011-09-25 18:18:18Z

As julkiewicz points out, your nested loops where you count the words could miss the terminating null on str. Additionally, if str ends with spaces, your current code would count an extra word.

You could replace this section:

int size = 1;
for(int i = 0; str[i] != '\0'; i++) {
   if(str[i] == ' ') {
      while(str[i] == ' ') {
         i++;
      }
      size++;
   }
}

..with something like this:

while (*str == ' ') str++;  // skip leading spaces on str
/* count words */
int size = 0;
char *s = str;
do {
  if (*s && *s != ' ') {
    size++;                       // non-space group found
    while (*s && *s != ' ') s++;  // skip to next space
  }
  while (*s == ' ') s++;          // skip spaces after words
} while (*s);

..which counts the starts of groups of non-space characters, rather than groups of spaces, and watches for the terminating null in the inner loops.

You might also consider changing:

for(int i = 0; token != NULL; i++) {

..to:

for(int i = 0; token && i < size; i++) {

..just as a slightly paranoid guard in case strtok finds more words than you counted (though it shouldn't).

jwodder · Accepted Answer · 2011-09-25 16:05:29Z

-2

I think gdb may be complaining about the fact that you never check the return value of malloc (or strdup) to see if it's non-NULL.

answered Sep 25, 2011 at 16:05

jwodder

57.3k12 gold badges115 silver badges134 bronze badges

-1: this doesn't really qualify as a useful answer - gdb doesn't care whether you check return values or not
– Paul R
Commented Sep 25, 2011 at 16:23

Add a comment |

Collectives™ on Stack Overflow

C string split problem

4 Answers 4

Your Answer

Not the answer you're looking for? Browse other questions tagged
c
string
or ask your own question.

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged cstring or ask your own question.

Linked

Related

Not the answer you're looking for? Browse other questions tagged
c
string
or ask your own question.