Allegro.cc - Online Community

Allegro.cc Forums » Game Design & Concepts » C Strings

This thread is locked; no one can reply to it. rss feed Print
C Strings
Neil Roy
Member #2,229
April 2002
avatar

This is something kind of neat that I recently learned. I have programmed in C for many years (two decades now I guess, wow) and I never knew about this. I'm more of a casual programmer so I don't tend to keep up on every nuance of the language. Anyhow, this is a really neat way to do strings in C, fairly easily! I really love it!

Try this program out, using a C compiler (I use GCC with Code::Blocks), it will compile and run just fine without errors.

#SelectExpand
1// Great example of assigning strings in C (with at least C99). 2#include <stdio.h> 3 4 5typedef struct { 6 char value[100]; 7} string; 8 9 10int main(int argc, char *arcv[]) 11{ 12 string a = {"hello"}; 13 puts(a.value); 14 15 a = (string) {"another string!"}; // overwrite value with a new string 16 puts(a.value); 17 18 string b = {"a NEW string"}; 19 puts(b.value); 20 b = a; // override with the value of another "string" struct 21 puts(b.value); // prints "another string!" again 22 23 return 0; 24}

The output is just as you would expect, it outputs each string as they are easily changed without a problem.

Has anyone else done this? Thoughts? Besides using C++, I prefer C only and generally compile with the "-std=gnu11" option which uses the latest 2011 C standard.

---
“I love you too.” - last words of Wanda Roy

Audric
Member #907
January 2001

It's the "simple" way of using structs directly, instead of pointers to structs that we are so used to (BITMAP*, ...)
It's only possible if the struct has a fixed size, and it will perform a memcpy of sizeof(your struct) bytes every time you assign - which is wasteful if your struct contains null-terminated strings with a lot of free space.
I've more often seen it used for light data, such as a 2D or 3D position, or a RGB triplet.

Note that you can only assign a value with {} on declaration, and only with data which is constant at compile time :
string a = {"a NEW string"};
string b = { __LINE__ };

These will not compile :

string c;
c = {"a NEW string"}; // nope

string d = {tolower("FOO")}; // nope

GullRaDriel
Member #3,861
September 2003
avatar

It's not just the strings, it's all typedefs initialisation.

Admitting you'd also stored the lenght of string inside the struct:

/* Great example of assigning typedef structs in C (I think it works in all versions) */
#include <string.h>
#include <stdio.h>

typedef struct {
    char value[100];
    int written;
} string;

int main(int argc, char *arcv[])
{
    string a = { "hello" , strlen( "hello") };
    printf( "value of a:\"%s\", len:%d\n" , a.value , a.written );
}

Edit:

Neil said:

Has anyone else done this? Thoughts? Besides using C++, I prefer C only and generally compile with the "-std=gnu11" option which uses the latest 2011 C standard.

And it would not be a problem if you don't, at least for structs initialisations.

Edit:

Last thought, this is also valid:

char a[100]="This is a REAL string initialisation";

"Code is like shit - it only smells if it is not yours"
Allegro Wiki, full of examples and articles !!

Kitty Cat
Member #2,815
October 2002
avatar

Audric said:

These will not compile :

string c;
c = {"a NEW string"}; // nope

string d = {tolower("FOO")}; // nope

Obviously the tolower one won't because it takes and returns a character, not a string. However, these will work in C99:

string s;
s = (string){"a NEW string"};

s = (string){"a REPLACEMENT string"};

Good luck with MSVC though, MS is still neglecting an 18 year old C standard. :-/

--
"Do not meddle in the affairs of cats, for they are subtle and will pee on your computer." -- Bruce Graham

Neil Roy
Member #2,229
April 2002
avatar

It's not just the strings, it's all typedefs initialisation.

Oh for sure, but it works, and it's a much simpler, elegant solution I think. It takes some of the pain out of using strings in C.

Audric said:

string d = {tolower("FOO")}; // nope

I would think the reason that wouldn't compile is obvious. tolower() returns an int. Not a string.

Quote:

c = {"a NEW string"}; // nope

That will not compile unless you typecast it as a (string), otherwise C doesn't know the size of it, which is why we can't normally assign a string in this way. You're basically trying to assign a normal array of chars to a string struct, you need to typecast it as was already shown.

c = (string){"a NEW string"}; // yup

When you create a new integer in C, it knows the size of it, 32bits for a normal int, so it can allocate exactly that much. When you create a char in C, it's the same deal, C knows the size of char, 8bit, so it can allocate the memory needed. When you create a `char *` you are creating a pointer to memory which will contain an unknown number of characters, if you assign a string when you create it with:

char *c = "some text here";

Than C knows how much memory to assign for it given what you provided, so it allocates that much when it is created. But you cannot do that later on with this pointer as later on, C only knows that *c is 32 (or 64) bits in size, and points to memory, but it does not know anything else, so it cannot assign a new string to that without risking overflows.

This is why you need to check the size of the string with special C string functions like strlen().

I'm not 100% clear on why structs work well for this though, but I assume many of the normal C rules for variables and pointers still apply.

If anyone has a clear reason why this works for structs, I would be interested. I would say they have a fixed size, but clearly in these examples, the strings were not a fixed size each time and it still worked. It's not enough for me to know it works, I NEED to know why. ;)

Kitty Cat said:

Good luck with MSVC though, MS is still neglecting an 18 year old C standard. :-/

This is exactly why I refuse to use MSVC. I stick to GNU C, specifically Code::Blocks with MinGW 5.3.0 at this time with the -std=gnu11 switch so i get the GNU version of the 2011 standard. There's also -std=c11, but the gnu version fixed some problems, I forget what now.

---
“I love you too.” - last words of Wanda Roy

GullRaDriel
Member #3,861
September 2003
avatar

It's just because you had enough room in char value. If not it would have sigseved.

Why it works because string toto and a string *toto=malloc(1*sizeof(toto)) are basically the same, except that one is alloced by you and dynamically.

"Code is like shit - it only smells if it is not yours"
Allegro Wiki, full of examples and articles !!

Elias
Member #358
May 2000

Pascal strings always were limited to 255 characters (since they stored the length in the first byte) - and who would ever need a string longer than that anyway?

But yes, it's basically two things:

b = a;

This always works in C if a and b have the same struct type, and is identical to doing a full memcpy of the entire struct (always 100 characters in your case).

a = (string) {"another string!"};

This is a new feature introduced in C99. You can even do this:

a = (string) {.value = "another string!"};

--
"Either help out or stop whining" - Evert

Neil Roy
Member #2,229
April 2002
avatar

Fascinating stuff. I really like this though.

So, because this is a struct, the memory is deallocated at program exit? That alone could make this worth doing, at least for more simpler tasks anyhow.

---
“I love you too.” - last words of Wanda Roy

Audric
Member #907
January 2001

I'm not sure if it's clear, but there are two distinct behaviors :

char *c = "some text here";
// identical to
char *c;
c = "some text here"

The only thing allocated here is a pointer, and assuming this piece of code is in a function, it is allocated on the stack - so it will be handed back when function return.
"some text here" is a literal array of characters. The compiler eliminates duplicates, and stores one instance in a piece of memory that's accessable fom everywhere in your program. The instructions makes "c" point to this part of memory. It is read-only, so don't try to modify the characters.

char c[] = "ABCD";
// identical to
char c[5] = "ABCD";
// identical to
char c[5] = { 'A', 'B', 'C', 'D', '\0' };

Allocates an array of 5 characters and immediately initializes it with contents : A B C D \0
The string is writable in this case.

The difference also takes place when you do this :

typedef struct {
   char value[100];
} value_string;

typedef struct {
   char *value; // only a pointer
} ref_string;

value_string a = {"TEST"};
ref_string b = {"TEST"}; // looks like the above, isn't it ?

a.value[0] = 't'; // no problem, this is part of the struct that you declared
b.value[0] = 't'; // No! this piece of memory doesn't belong to you!

GullRaDriel
Member #3,861
September 2003
avatar

Neil said:

So, because this is a struct, the memory is deallocated at program exit? That alone could make this worth doing, at least for more simpler tasks anyhow.

Yes, it is, at exit. If your concern is memory deallocation, remember that in these modern days the OS will free everything for you if you don't.

Edit:

Audric, I think that b->value[0] would work, admitting it's initialized.

"Code is like shit - it only smells if it is not yours"
Allegro Wiki, full of examples and articles !!

Audric
Member #907
January 2001

No, the syntax that I typed is correct, but it causes a runtime error.

#SelectExpand
1#include <stdio.h> 2 3int main(void) { 4typedef struct { 5 char value[100]; 6} value_string; 7 8typedef struct { 9 char *value; // only a pointer 10} ref_string; 11 12static const char * const_string = "TEST"; // An unrelated string which happens to have same content 13value_string a = {"TEST"}; 14ref_string b = {"TEST"}; // looks like the above, isn't it ? 15 16a.value[0] = 'B'; // no problem, this is part of the struct that you declared 17 18printf("a: %s Address: %ld\n", a.value, a.value); 19printf("b: %s Address: %ld\n", b.value, b.value); 20printf("const_string: %s Address: %ld\n", const_string, const_string); 21fflush(stdout); 22b.value[0] = 'F'; // No! this piece of memory doesn't belong to you! 23// If the above is uncommented and doesn't crash, const_string will be modified into "FEST" 24printf("b: %s Address: %ld\n", b.value, b.value); 25printf("const_string: %s Address: %ld\n", const_string, const_string); 26}

Output on Jdoodle:

a: BEST Address: 140723708153920
b: TEST Address: 94500360522128
const_string: TEST Address: 94500360522128
Segmentation fault (core dumped)

I couldn't find an online compiler which didn't crash (or silently halt). Note that the last two addresses are the same : a.value and const_string are actually pointing to the same memory area.

I just wanted to raise awareness about how = {"string"} can be misleading. Depending on context, it initializes by copying the entire data, or it merely points to a piece of constant, read-only data.

Elias
Member #358
May 2000

Since I always find it fascinating, this is the actual assembly created for the Neil-strings:

string a = {"hello"};

movabsq  $478560413032, %rax // $478560413032 = 0x6f6c6c6568 = 'h', 'e', 'l', 'l', 'o', 0, 0, 0
movl  $11, %ecx           // 11 * 8 = 88        
leaq  8(%rsp), %rdi       // move address of 8 after "a" to RDI
movq  %rax, (%rsp)        // move "hello\0\0" into "a"        
movq  %rbp, %rax          // move 0 into RAX
rep stosq                   // place 88 0-bytes into "a" (RAX to RDI)
movl  $0, (%rdi)          // move remaining 4 0-bytes into "a" to make it 100

What's interesting is that there is no static string containing "hello" at all - the compiler figured out to just encode it as a number.

a = (string) {"another string!"};

movabsq  $2338042655863172705, %rax // "another "
leaq  16(%rsp), %rdi             // move address 16 after "a" to RDI
movl  $10, %ecx                  // 10 * 8 = 80
movq  %rax, (%rsp)               // move "another " into "a" 
movabsq  $113723913172083, %rax     // "string!"
movq  %rax, 8(%rsp)              // move "string!" into "a"
movq  %rbp, %rax                 // move 0 into RAX
rep stosq                          // place 80 0-bytes into "a"
movl  $0, (%rdi)                 // remaining 4 0-bytes to make it 100 again

I found it interesting that there is zero difference between initialization and assignment - and because the string does not fit into a single number it splits it into two numbers this time!

#SelectExpand
1b = a; 2 3movq (%rsp), %rax // move first 8 bytes of "a" into RAX 4movq %rax, 112(%rsp) // move the earlier 8 bytes into "b" 5movq 8(%rsp), %rax // get the next 8 bytes of "a" 6movq %rax, 120(%rsp) // and move them into "b" 7movq 16(%rsp), %rax // ... 8movq %rax, 128(%rsp) 9movq 24(%rsp), %rax 10movq %rax, 136(%rsp) 11movq 32(%rsp), %rax 12movq %rax, 144(%rsp) 13movq 40(%rsp), %rax 14movq %rax, 152(%rsp) 15movq 48(%rsp), %rax 16movq %rax, 160(%rsp) 17movq 56(%rsp), %rax 18movq %rax, 168(%rsp) 19movq 64(%rsp), %rax 20movq %rax, 176(%rsp) 21movq 72(%rsp), %rax 22movq %rax, 184(%rsp) 23movq 80(%rsp), %rax 24movq %rax, 192(%rsp) 25movq 88(%rsp), %rax 26movq %rax, 200(%rsp) 27movl 96(%rsp), %eax // read the last 4 bytes of "a" 28movl %eax, 208(%rsp) // move them into "b" for the full 100 bytes

Instead of using one of the "rep" instructions it simply uses 13 moves for a 100 byte string.

--
"Either help out or stop whining" - Evert

Audric
Member #907
January 2001

You've made me want to check as well :)
From my tests with godbolt.org and a stack variable char mydata[] = "ABCDEFGHI";, GCC-x64 seems to favor the "unrolled loop", and clang seems to favor a kind of memcpy(), having stored the entire string in a data segment.

Elias
Member #358
May 2000

Ohh, I didn't know godbolt.org. And yes, I can see that - clang inserts an actual "call memcpy" for the "b = a", very disappointing. The icc compiler actually uses "rep movsd". Three compilers three completely different implementations of the same "b = a;" statement :D

Now the next step would be to benchmark the different versions!

--
"Either help out or stop whining" - Evert

Neil Roy
Member #2,229
April 2002
avatar

Elias said:

Since I always find it fascinating, this is the actual assembly created for the Neil-strings:

That was fascinating. I looked at some of the assembly, but I have less experience with it than you do. I was curious as to how all of that ended up as assembly as well. So glad you posted this.

---
“I love you too.” - last words of Wanda Roy

Go to: