Allegro.cc - Online Community

Allegro.cc Forums » Programming Questions » const char *? -- more n00bisms

This thread is locked; no one can reply to it. rss feed Print
 1   2   3 
const char *? -- more n00bisms
Milan Mimica
Member #3,877
September 2003
avatar

No, I don't see the difference. Memory block returned by .data() doesn't need to be freed either and pointer is made invalid when the string is deleted.

Goalie Ca
Member #2,579
July 2002
avatar

I interpret it so that it is possible that c_str() is a copy of the string in c-format (why it doesn't need to be freed is emphasized in the spec). data() is just some pointer to the heap that string actually uses. It does not need to be null-terminated and can do all kinds of other things.

-------------
Bah weep granah weep nini bong!

Michael Jensen
Member #2,870
October 2002
avatar

Quote:

Assuming above is the issue you're talking about I've run into it and dealt with it.

Dustin, that's sort of my problem, but not really -- and it IS present in GCC. It's not really the compilers fault though. You can't use the char * returned from data or c_str after the object has gone out of scope!

Quote:

So the problem is that sprintf() doesn't put '\0' at the end, or is printf() ignoring it for some reason?

No, sprintf works just fine, and will put a Null Terminator at the end of your string for you; What happens with Dustin's example is that he's declaring a string that goes out of scope (and the internal data is therefore freed) before he even does anything with it. It probably only works in GCC 95% of the time because the pointer is freed before it passes the data on to the next consumer, and the next consumer is just getting lucky the majority of the time that the same data is still there.

I have noticed though that MSVC is amazing as far as what it lets me do with OOP as opposed to GCC which just finds endless reasons to complain at me for really obscure things. :-(

ImLeftFooted
Member #3,935
October 2003
avatar

Quote:

You can't use the char * returned from data or c_str after the object has gone out of scope!

This sentance shows that you've missed a piece of understanding in my posts.

Quote:

Dustin's example is that he's declaring a string that goes out of scope (and the internal data is therefore freed) before he even does anything with it. It probably only works in GCC 95% of the time because the pointer is freed before it passes the data on to the next consumer, and the next consumer is just getting lucky the majority of the time that the same data is still there.

I politely request that you rethink your logic and then make a new post with something useful in it.

Michael Jensen
Member #2,870
October 2002
avatar

Quote:

I politely request that you rethink your logic and then make a new post with something useful in it.

Yeah, that was polite. ::)

Quote:

The solution to this problem is to stop using MSVC and start using GCC. MSVC just sucks like that.

Look, I understand you hate MSVC, but it's just not the problem I'm having. MSVC has proven to be just as good as, if not better than GCC to me. The problem I had occured in GCC also, and it makes plain sense: You can't use pointers to structures/classes (or to members of such structures/classes) that have been freed/deleted (which happens automatically to an object on the stack when it goes out of scope...) You just can't.

The compiler can't do things like "MyFunction((4+x) * c)", or "MyFunction(x(), y(), z())"
it rearranges the code so you end up with things like

temp = 4 + x;
temp *= c;
MyFunction(temp)

or

temp1 = x();
temp2 = y();
temp3 = z();
MyFunction(temp1, temp2, temp3);

In your example you create a string by passing in a char * (we'll call it A)
that string creates another char * of it's own (we'll call it B), and then copies the contents of A into B...

The compiler reorganizes your code for optimization of course, and the last point in time that you use your variable that was allocated on the stack, it gets popped off (it's destructor is called and it's memory is freed.)

The function that you're passing B into (Well call this F) needs a reference to B of course, which is gotten right before F is called. Once it has the pointer to B returned by c_str, the string is no longer needed and it's popped off the stack, it's destructor is called, and in the process B is freed, then the function is called and the pointer to B (now freed) is passed in. Just because you have a pointer to a piece of data inside an object does not mean that the object won't get popped off the stack after it's last usage. If it's an object you allocated, THEN it won't be deallocated until you delete it, but if it's on the stack the compiler can reorder when it's destructor is called freely to anywhere after it's last usage.

It all makes perfect sense, here's what I believe to be the equivalent:

char * myref = (char*)malloc(100); // creating the new string (B)
sprintf(myref, "Something"); //creating the new string (copying A into B)
char * temp = myref; //The object's function returned a reference to B 
                     // that we can pass in to F (c_str is called)
free(myref);  //Call the object's destructor since it's 
              // not used again after this point... (B is freed)
MyFunction(temp); //call our function (F) now passing in a reference to B.

Kitty Cat
Member #2,815
October 2002
avatar

Quote:

Once it has the pointer to B returned by c_str, the string is no longer needed and it's popped off the stack, it's destructor is called, and in the process B is freed

It depends on when the string is supposed to go out of scope. When it's no longer referenced in a command, or when the current command is finished? Does
printf("%s", string("foo").c_str);
become

{
    string tmp("foo");
    const char *arg = tmp.c_str();
    const char *s = "%s";
    printf(s, arg);
}

or

{
    const char *arg;
    {
        string tmp("foo");
        arg = tmp.c_str();
    }
    const char *s = "%s";
    printf(s, arg);
}

?

--
"Do not meddle in the affairs of cats, for they are subtle and will pee on your computer." -- Bruce Graham

Michael Jensen
Member #2,870
October 2002
avatar

You understand exactly what I meant KC (and Actually I wouldn't be surpirsed if you knew about some obscure C standard that said it should be one way or the other -- I have quite a high opinion of you...)

As far as I can guess #2 in your example would be what MSVC might be up to, while #1 is what GCC could be doing. From the information provided in this thread I would assume that the MSVC optimizer tries to remove objects from the stack the first chance it gets (optimized for low memory consumption?) while the GCC one let's the stagnate until right before the scope is lost (optimized for speed)?

For other readers I just want to emphasize the huge difference between:

MyFunction(string("Xyz"));

And

MyFunction(string("Xyz").c_str());

In the second function the string is declared (aparently) in the scope of the actual function call, and since you only need the result of one of the object's deterministic functions in order to call MyFunction, once the result is had, you would ideally no longer need the initial object (the string).

Kitty Cat
Member #2,815
October 2002
avatar

Quote:

From the information provided in this thread I would assume that the MSVC optimizer tries to remove objects from the stack the first chance it gets (optimized for low memory consumption?) while the GCC one let's the stagnate until right before the scope is lost (optimized for speed)?

It depends on what the standard says. If it defines any behavior for this case, then one compiler or the other is out of spec. If it's the optimizer causing it, then the optimizer is broken (but I have a feeling it's just how the two ocmpilers handle it, regardless of optimization). I don't know what, if anything, the spec says on the issue.

Quote:

In the second function the string is declared (aparently) in the scope of the actual function call, and since you only need the result of one of the object's deterministic functions in order to call MyFunction, once the result is had, you would ideally no longer need the initial object (the string).

Actually, the difference is that in the first one, a temp string is made that is copied to another string on the stack, which remains in scope for the given function call as a parameter (if you want to avoid the copy, then pass an object the class can be constructed from; as long as the parameter is not a non-const reference). It doesn't matter if the temp goes out of scope after it's copied to the one on the stack or not because it's never used after that.

The second one returns an internal pointer, and that is copied to the stack as a parameter. This does matter with scope as the returned pointer will only remain valid as long as the string object is valid.

--
"Do not meddle in the affairs of cats, for they are subtle and will pee on your computer." -- Bruce Graham

ImLeftFooted
Member #3,935
October 2003
avatar

Quote:

You can't use pointers to structures/classes (or to members of such structures/classes) that have been freed/deleted (which happens automatically to an object on the stack when it goes out of scope...) You just can't.

Your words are getting more passionate, maybe you believe you can make something true by saying it more?

Your passion is keeping you from seeing the bigger picture. Rules defined by a particular language or compiler are simply that, just rules. There is no hand of god enforcing them and you don't even have to agree with them.

I would like to challenge your statement by saying this: Why must the returned string be a "members of such structures/classes". What rule defines this, and where did it come from?

Milan Mimica
Member #3,877
September 2003
avatar

http://www.doc.ic.ac.uk/lab/cplus/c++.rules/chap18.html

"18.7. Temporary Objects

Port. Rec. 16
Do not write code which is dependent on the lifetime of a temporary object.

Temporary objects are often created in C++, such as when functions return a value. Difficult errors may arise when there are pointers in temporary objects. Since the language does not define the life expectancy of temporary objects, it is never certain that pointers to them are valid when they are used."
See the example too.

However, I'm not sure how trusted reference this article can be because it also says: "Do not assume that an object is initialized in any special order in constructors." while I know there IS a special order.

Michael Jensen
Member #2,870
October 2002
avatar

Quote:

I would like to challenge your statement by saying this: Why must the returned string be a "members of such structures/classes". What rule defines this, and where did it come from?

Buh? Are you fucking mad? Who taught you to program?

the standard string class is just implemented that way -- it has an internal pointer to a char *.

When a string class is destructed (you can delete it, or it can fall off the stack) the char * is freed. No one forced them to do it that way, it just WAS done that way, and now you have to live with it.

When you pass a reference to that char * into your function, a reference to a char * that has already been freed, the old data is not guaranteed to be there. It's not safe.

Quote:

Your words are getting more passionate, maybe you believe you can make something true by saying it more?

Me having passion for my hobby has nothing to do with it -- I think you're projecting though. Maybe you believe you can make something true by saying complete nonsense. That's your deal, count me out. -- The whole C++ standard is defined somewhere -- I think the last one was in the 90s the next one is comming out in 2009 (or so they say)... -- If you really need proof of concept go look it up -- as for me, my code works in GCC and MSVC.

Quote:

However, I'm not sure how trusted reference this article can be because it also says: "Do not assume that an object is initialized in any special order in constructors." while I know there IS a special order.

But is it the same order in every compiler? ... Someone posted some really neat code a while back that produced different results on different compilers because the C standards didn't specify an order for something and each compiler was doing it it's own way. I found it really interesting.

nonnus29
Member #2,606
August 2002
avatar

For any non-trivial project it is completely reasonable and even correct to impose a convention such as this: The caller is responsible for freeing allocated memory.

Quote:

IMO, the better way to do it is to avoid C strings at all, if possible.

C strings are easy if you understand how they work. They're not exactly convenient however.

Milan Mimica
Member #3,877
September 2003
avatar

Quote:

But is it the same order in every compiler?

The order is prescribed by the standard. In my experience gcc-3.x and newer and MSVC8 use the order of how the members are declared. My code highly relies on the order of initialization and never had problem with it. Moreover, some things cannot be written without knowing the order, like a class with all const members.

ImLeftFooted
Member #3,935
October 2003
avatar

Quote:

the standard string class is just implemented that way -- it has an internal pointer to a char *.

Am I correct in assuming the idea that std::string has an internal pointer to a char is your reasoning for why the return value for std::string::c_str will become invalid after std::string::~string?

If that is so I'd ask you to remain calm, focused, and go through this link of reasoning and look for possible holes.

I tried to give you a hint which did not seem to work, so I will reword it. Does the standard require that the return value for c_str be part of std::string and deallocated with std::string::~string? Where are you getting the idea this is required? Do you feel comfortable trusting your source?

Michael Jensen
Member #2,870
October 2002
avatar

Quote:

I tried to give you a hint which did not seem to work, so I will reword it. Does the standard require that the return value for c_str be part of std::string and deallocated with std::string::~string? Where are you getting the idea this is required? Do you feel comfortable trusting your source?

If that's your argument then it has nothing to do with the compiler anyway, it's your standard library. My point is, something has to free the memory returned by c_str somewhere if it's allocated somewhere otherwise you'd get memory leaks. Microsoft does it in a perfectly logical place, the destructor. It's really a horrible practice to retain a pointer to something an object gave you after the object is disposed especially if it's not documented to work like that.

Let me ask you this Dustin: If there's no standard defining where the pointer should go(what scope it should have), or when it's allocated/destroyed, then why write code that's dependent on your STL being implemented one undocumented way over another? This game of yours is dumb, if this is what you had meant the whole time then you should have just came out and said it. I now think you a far bit less insane than I thought you to be last night, but I also now know that you're talking about something completely different.

I stepped through the disposing of a basic_string and _Tidy is called in the destructor which deallocates "any storage" -- in the MSVC version. (_Built = true, _Newsize = 0).

1 void __CLR_OR_THIS_CALL _Tidy(bool _Built = false,
2 size_type _Newsize = 0)
3 { // initialize buffer, deallocating any storage
4 if (!_Built)
5 ;
6 else if (_BUF_SIZE <= _Myres)
7 { // copy any leftovers to small buffer and deallocate
8 _Elem *_Ptr = _Bx._Ptr;
9 if (0 < _Newsize)
10 _Traits_helper::copy_s<_Traits>(_Bx._Buf, _BUF_SIZE, _Ptr, _Newsize);
11 _Mybase::_Alval.deallocate(_Ptr, _Myres + 1);
12 }
13 _Myres = _BUF_SIZE - 1;
14 _Eos(_Newsize);
15 }

Further more, supporting Dustin's claims:

  char * m = NULL;
  {
    string x = "42";
    m = (char*)x.c_str();
  }

  printf("%s\n", m);

Works in GCC (g++ -v = 3.4.2), but not MSVC 8. Though there's no guarantee that it will work in any given compiler, because as Dustin pointed out, there is no rule or standard that says that what c_str returns can't be destroyed in x's destructor.

gillius
Member #119
April 2000

This thread seems a lot more complicated than it needs to be. I must have missed why you can't just return std::string -- but I'm going to assume you have to return a const char*. There are three, and only three options that I can see:

  1. Make the function return std::string or some other object encapsulating the string

  2. Tell the user they must call delete[] or free on the returned pointer

  3. Have the user provide the memory as parameters (buffer and length)

EDIT: if I sound silent on the c_str issue, it's because you can't keep around a pointer obtained from c_str after deleting the string it comes from. That's just a fact. If you say "new X; X = something; delete X; print X;" and X happens to be "something" then it just happens to be that way. I'm not even sure c_str is valid after modifying the string.

Gillius
Gillius's Programming -- https://gillius.org/

Paul Pridham
Member #250
April 2000
avatar

Simple non-reentrant C. If the user wants to keep the string around after the call, he should make a copy.

1static char* buffer_ = NULL;
2 
3void deinit_a_string(void)
4{
5 if(buffer_) free(buffer_);
6}
7 
8const char* get_a_string(void)
9{
10 deinit_a_string();
11 
12 buffer_ = strdup("Jive turkey!");
13 
14 return(const char*)buffer_;
15}

And yes, you can use your std::string and keep that as the static variable instead.

ImLeftFooted
Member #3,935
October 2003
avatar

The above method is one way to implement it. Another could be a compiler extension to perform cleanup at the 'semicolon'*. The first method loses thread safety of-course, while the second method is dependent on implementations of other sections of the compiler and possibly much harder.

I don't know how GCC handles this. After some quick googling there are some issues about std::string on multi-core machines. Maybe GCC has not implemented this feature very elegantly or even on purpose.

Quote:

why write code that's dependent on your STL being implemented one undocumented way over another?

I have no qualms locking into a given compiler.

*In an accurate world, this term requires a more clear definition.

Michael Jensen
Member #2,870
October 2002
avatar

Quote:

This thread seems a lot more complicated than it needs to be. I must have missed why you can't just return std::string

I was writing a function that overloads a virtual in the base class that returns a char * (I had no control over that.)

I have new methods that return strings, but for the old methods to work, they have to return char *, I decided that I'd just keep the string around as a private variable for the class, which works great.

Quote:

if I sound silent on the c_str issue, it's because you can't keep around a pointer obtained from c_str after deleting the string it comes from. That's just a fact. If you say "new X; X = something; delete X; print X;" and X happens to be "something" then it just happens to be that way. I'm not even sure c_str is valid after modifying the string.

Exactly. That's all I'm trying to say. Just because version X of GCC's standard library lets you do it doesn't mean it's a good thing.

Milan Mimica
Member #3,877
September 2003
avatar

ISO 14882:2003 12.2 , "after the end of full expression sequence point, a sequence of zero or more invocations of destructor functions for temporary objects takes place"

A sequence point is a semicolon, of course.

printf("%s\n", std::string("blah").c_str()); //is valid
The temporary object goes out of scope when the function exits.

Here's a test program so you can test your broken compilers:

1#include <iostream>
2 
3class Some {
4public:
5 ~Some() {
6 std::cout << "dtor called!" << std::endl;
7 }
8};
9 
10void Get(const Some &s) {
11 std::cout << "dream on" << std::endl;
12}
13 
14int main() {
15 Get(Some());
16 
17 return 0;
18}

ImLeftFooted
Member #3,935
October 2003
avatar

Well there you have it

Johan Halmén
Member #1,550
September 2001

It would help if we would stop calling char* a string. char* is a pointer to a char. The char that it points to might in memory be followed by another, and another, and finally by '\0'. Some handy c functions might treat such a bunch of chars as a string (like in BASIC).

But such a bunch of data instances must always be treated with care, especially if you want to pass such a bunch from a function to its calling function. Because the bunch is not passed, only the pointer to the memory location of the first data instance. That is, if your function returns, say a char*.

If we insist on talking about c strings, we must be aware of when the c string contains valid data and when it becomes invalid. And remember, invalid data != garbage. If the function returns a valid c string, the string must be deleted or freed later.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Years of thorough research have revealed that the red "x" that closes a window, really isn't red, but white on red background.

Years of thorough research have revealed that what people find beautiful about the Mandelbrot set is not the set itself, but all the rest.

Michael Jensen
Member #2,870
October 2002
avatar

Quote:

A sequence point is a semicolon, of course.

wikipedia said:

Consider two functions f() and g(). The + operator is not a sequence point and therefore in the expression f()+g(), it is possible that either f() or g() will be executed first. The comma operator is a sequence point, and therefore in f(),g() the order of evaluation is defined (i.e., first f() is called, and then g() is called). The type and value of the expression is that of g(); the value of f() is discarded.

when you call function F, each term is evaluated independently BEFORE the function is called.

Example:

F(x(), y(), z());

is rendered as:

t1 = x();
t2 = y();
t3 = z();
F(t1,t2,t3);

Kitty Cat
Member #2,815
October 2002
avatar

Quote:

The comma operator is a sequence point, and therefore in f(),g() the order of evaluation is defined (i.e., first f() is called, and then g() is called).

But is the comma in a parameter list the same thing? Parameters are pushed onto the stack right-to-left, which is different from the left-to-right operation of the comma operator. It's not like other characters don't have multiple meanings (*, <, >, ., etc).

Example:

#include <stdio.h>

int main()
{
    int a = 0;
    /* should give 0, 1, 2 with left-to-right order */
    printf("%d, %d, %d\n", a++, a++, a++);
    return 0;
}

$ gcc -W -Wall tmp.c -o tmp
tmp.c: In function ‘main’:
tmp.c:6: warning: operation on ‘a’ may be undefined
tmp.c:6: warning: operation on ‘a’ may be undefined
$ ./tmp
2, 1, 0

So, it's clearly not the same as the comma operator.

--
"Do not meddle in the affairs of cats, for they are subtle and will pee on your computer." -- Bruce Graham

Michael Jensen
Member #2,870
October 2002
avatar

Wow, and I thought that was a smoking gun. KC you're a badass. ;)

So, to bring my ignorance to it's full measure: what else are commas used for in C++ besides parameter lists? I really can't think of anything... maybe when initializing arrays or something...

edit:
so in my above example I got the order wrong, it should be:

Example:

F(x(), y(), z());

is rendered as:

t3 = z();
t2 = y();
t1 = x();
F(t1, t2, t3);

no?

 1   2   3 


Go to: