problems with unicode routines

problems with unicode routines

The Master

Member #4,498

April 2004

Hey, I've encountered a problem regarding transference of char data from one string to another.

Consider the code:

1int ParseCodeChunk( const char *pCode, int size ) {
2 
3  char Stream[10000];
4  char Line[256];
5  struct CodeData SomeCode[50]; // holds compiled code
6 
7  // copy the pCode to Stream
8 
9  while( we are not at the end of the script ) {
10 
11    /* get a segment of data from Stream and put it in line based
12       on delimiters (eg. spaces, (, ), {, }, etc) */
13 
14    ustrcpy( SomeCode[ currentline ].SomeData, Line );
15 
16    // do something to SomeData depending on code contents
17 
18    // clear the line buffer
19    memset( Line, 0, ustrlen(Line) );
20    currentline++;
21 
22  }
23 
24  return OK;
25}

When I run this function (which is a bit more complex than above) it separates the code into Line based on the delimiters fine enough. I'm using MSVC .NET so I can stop it at a break point and check the values of the strings so I know that the values are being copied from Stream to Line properly. What I find irritating is that in the second iteration of the while loop, the code

  ustrcpy( SomeCode[ currentline ].SomeData, Line );

does absolutely nothing. Not only that, but if, as an experiment I said:

  SomeCode[ currentline ].SomeData[0] = Line[0];

The second iteration (and all others afterwards) won't copy any data from Line to SomeData. I would like to know why that is, because it all works perfectly normal for the first iteration.

We can only do what we feel is right each moment as we live it.

Kris Asick

Member #1,424

July 2001

You should post the entire code just in case the following doesn't help:

1. ustrlen() returns the length of the string, not the amount of memory it uses! The size value for memset() should be sizeof(Line).

2. Remember that one unicode character actually takes up two regular characters, so an array of 256 used for unicode can only fit 128 characters. (Including the NULL terminating character.)

3. Use ustrzcpy() instead. It's safer.

--- Kris Asick (Gemini)
--- http://www.pixelships.com