I have this C structure:
To clear it, I call this function:
Assuming that I create a new object:
What will happen to the call to free() if the garbage found within my_string (more specifically, my_string->array) is non-NULL or (even worse
) just happens to be pointing to some memory location that I had actually used malloc() on so it's a "valid" allocation address (ref), but not one that I actually want to free?
Obviously, if the value points to memory that you haven't allocated the OS will terminate your program (I just tried with a random integer and it segfaulted). If it points to memory that you have allocated, but have no intent to free, your program will segfault later on if you dereference it.
You just need to define an API to use with the string. Either require string_init on stack allocated strings or use string_create to allocate heap allocated strings, both of which would act like a constructor and set the structures values to valid values.
use string_create to allocate heap allocated strings, both of which would act like a constructor and set the structures values to valid values
You're depending on malloc() to zero out the memory? (for security reasons). This 64 bit linux I'm using doesn't do that anymore for some reason. The calloc() function still works as expected. Or is string_create some C++ thing that does it?
[EDIT]
nevermind, I just woke up. I must have been thinking of allocating an array of pointers.
You're depending on malloc() to zero out the memory?
Me? I was relying on my own API function to do it... Perhaps it should be called s_string_create instead:
I would also write my own API functions for access, manipulation, and destruction to ensure that the structure was always valid (assuming no bugs in the API).
So have one function that 0's everything, another to 0 everything once there have already been some sort of malloc/calloc/realloc used to assign memory on the heap?
...then (unchanged from above)...
which is only used after I've used up the string and want to deallocate memory before leaving a function or whatever.
I would also write my own API functions for access, manipulation, and destruction to ensure that the structure was always valid (assuming no bugs in the API).
Doing this. This is actually WHY I'm writing these functions: everything is currently "let's directly access/manipulate/destroy the strings." I'm tired of everyone doing it their "own way" which can lead to hard-to-debug bugs.
malloc isn't guaranteed to return you a zeroed array.
Test at work have shown that until you actually write in the allocated array, the OS is likely to just create an entry in the the memory pool without reserving it.
For that purpose you should use calloc.
Plus strncpy \0 to a char array is just silly. memcpy, or directly calloc it.
I don't care whether the array is 0'd or not. In fact, if the char* array is allocated to some huge value, it's faster to simply allocate the memory with malloc and keep track of string length with size (which is actually what I'm doing), than to calloc it and wait while the contents are 0'd out.
Another option I could use is:
This requires that I remember to always 0-out the array before calling free_string(), but it reduces redundant code:
free_string should also free the given container. You're not gonna use static only s_string declaration, aren't you ?
The s_string object is being used for a reusable string location, similar to std::string. If you want to see the full code and the test app for it, it's in the Spoiler.
I don't allocate and free the s_string container, since I want it to be more-or-less persistent. When a long string is given, it should expand the array's size as needed. But when a shorter string is given, instead of freeing the memory, it simply replaces the text and sets the size to fit it.
So you'll never use a s_string *str ?
Not typically, no. If suppose I would use them if I'm going to have an array of text, similar to std::vector< std::string >, but since it only needs to allocate 2 unsigned int's and one pointer on the stack, I'm not too worried about putting it on the heap at this point. Only the s_string.array is ever allocated on the heap.
In that case everything seems ok to me :-)
I've been wanting to write my own string API for C and C++ also (to make strings convenient, like in modern languages). I might add the following to what you already have (I would also add the type_ prefix to what you have to avoid name collisions):
Too much?
bamccaig, scrolling through your list quickly, I can see an immediate use for concatenation. A lot of the others, though, would be useful mainly for a full-fledged API like you're suggesting, since it would help protect the data members (like C++'s private: and protected: keywords do).
Using printf or its variants could be done directly on the array member of the item with what I'm looking at, as data protection isn't as big of a concern for me.
I don't know if I'll need any special handling for UTF8/UTF16 or Unicode formats, so if I find that things are breaking, I may have to change my habits. Using the "gettext.h" library, a lot of the strings I'm passing into these functions will be converted to the appropriate type, though I'm not sure if finding all of those "0x00"'s throughout will mean a rewrite to this code, or if it'll "just work". I may have to replace "strlen()" with a unicode-aware variant, perhaps?
Instead of manually setting to 0/NULL all the items in the init function, just use memset. Lots easier
Instead of manually setting to 0/NULL all the items in the init function, just use memset. Lots easier
May as well use calloc in that case, then. Even easier. Just be careful because NULL isn't guaranteed to have a 0'd out bit pattern.
And FYI, freeing NULL is valid, so it's useless to check the pointer for non-NULL before calling free.
I don't want to 0 out all of the memory within the array when I'm freeing it: only the pointer needs to be set to 0/NULL.
The string arrays don't necessarily <i>need</b> to be 0'd out: if I store my size variable (a lot like std::string), what's the point of memset/calloc all the remaining array elements to 0? I loop from 0..array.size; nothing more, therefore, no out-of-bounds (and if I'm dumb enough to TRY to access out-of-bounds data, I deserve the resulting SEGFAULT).
True. But what's faster: testing a register for 0, or calling a function which will just return immediately (because it runs the same "if NULL" test)?
May as well use calloc in that case, then. Even easier. Just be careful because NULL isn't guaranteed to have a 0'd out bit pattern.
You can't calloc a statically allocated struct
bamccaig, scrolling through your list quickly, I can see an immediate use for concatenation. A lot of the others, though, would be useful mainly for a full-fledged API like you're suggesting, since it would help protect the data members (like C++'s private: and protected: keywords do).
Using printf or its variants could be done directly on the array member of the item with what I'm looking at, as data protection isn't as big of a concern for me.
I just think that the advantage you get from relying on accessors is beneficial over the minor performance gain you get by directly accessing the member. Of course, it's easy to manually access it, but humans make mistakes and eventually it can lead to a hard to track down bug. I'd rather go with safe/defensive programming until a particular bottleneck has been identified with an actual tool, at which point optimizations can be made to that particular part of code. Direct access is still possible for when you need it.
I don't know if I'll need any special handling for UTF8/UTF16 or Unicode formats, so if I find that things are breaking, I may have to change my habits. Using the "gettext.h" library, a lot of the strings I'm passing into these functions will be converted to the appropriate type, though I'm not sure if finding all of those "0x00"'s throughout will mean a rewrite to this code, or if it'll "just work". I may have to replace "strlen()" with a unicode-aware variant, perhaps?
Now that you mention it, I think it would be nice to have an API for manipulating UTF-8 strings in C and C++. There'd be a minor performance penalty, but overall the flexibility of UTF-8 would be very convenient to have. I think if I ever do tackle a string data type (in C and/or C++) I'd want it to be UTF-8 based.
True. But what's faster: testing a register for 0, or calling a function which will just return immediately (because it runs the same "if NULL" test)? 
Depends if the variable's already in a register or not, whether it's in cache or not, and how likely it is for the test to fail. While the check could avoid a quick function call, it could just as easily incur an extra check (and branches aren't cheap).
Not that you should be allocating and freeing in speed-sensitive code, so it shouldn't matter much either way. In that case, I'd simply ask why duplicate what the standard already guarantees for you.
I've added a branch to libbam to work on a UTF-8 compatible string type.
Wish me luck (and time and energy).
Actually, why not just use what's already been done in Allegro?
I wanted to pull some sort of printf or sprintf functionality, and from the 4.2 (and 4.4) sources, I found that in src/text.c, the textprintf() function just calls uvszprintf() from within the src/unicode.c file, and that calls its own helper files, and so on and so forth.
The only "problem" or "limitation" I saw with what they did with it is that the character buffer has to be allocated ahead of time (so doing something like "char buf[512];" which you pass in as one of the arguments to uvszprintf, whereas I would want to actually have that buffer grow to fit the situation more dynamically (like a std::string would allow).
But if you're going to be be using the Allegro library anyway, I guess it's pointless to reinvent the wheel: just use what's already been done and tested (cross-platform, and bugs worked out by a whole community, to boot!).
So now I'm adding a new function that creates that "buf[]" on the stack (maybe I can still do it on the heap... ::ponders:: ), and after it's been filled by the formatted string, just call set_text() to actually set the string.
A lot of running around and hoop-jumping, but it'll be faster to continue on with the rest of the programming and come back to this and "perfect" the guts at a later time. After all, that is the whole reason for OOP's code-hiding...
EDIT: bamccaig: looks like your src/bam_string.c is a little bare
The header file has a LOT more to look at
bamccaig: looks like your src/bam_string.c is a little bare
The header file has a LOT more to look at
Yeah, it's a WIP[1].
I'm sort of in the mood to approach the source tonight, but I expect I'll hit some crippling road block[2] long before it's working and then put it off for a few months.
Hopefully that doesn't happen. I'm sure there's a lot to learn by implementing a UTF-8 string. Unicode has interested me for a long time now and it's nice in languages like C# where you just automatically have support. It would be sweet to have a similar luxury in C and C++.
