Sane string conversion in C++11

Edgar Reynaldo

For some reason, strings are the most neglected class in C++.

You can't assign a POD to a string, and you can't assign a string to a POD. I call BS. (Big Stupid).

I made a simple converter class. How might I improve it? Right now you can assign a STR to a POD, and a POD to a STR. It's really simple, just uses a stringstream's built in templates to convert between data and string and back.

#SelectExpand
  1#include <iostream>
  2#include <sstream>
  3#include <string>
  4#include <typeinfo>
  5
  6
  7
  8class STR {
 std::string s;
 std::stringstream sstrm;
 11public :
 12
 /// Constructors
 STR() : s("") , sstrm() {}
 template <class T>
 STR(T t) : s("") , sstrm() {
    sstrm.clear();
    sstrm << t;
    s = sstrm.str();
 }
 STR(const char* cstr) : s(cstr) , sstrm() {}
 STR(std::string str)  : s(str)  , sstrm() {}
 
 /// Operator =
 template <class T>
 STR& operator=(const T& t) {
    sstrm.clear();
    sstrm << t;
    s = sstrm.str();
    return *this;
 }
 STR& operator=(const char* cstr) {
    s = cstr;
    return *this;
 }
 STR& operator=(std::string str) {
    s = str;
    return *this;
 }
 
 /// Cast operator
 template<class T>
 operator T () {
    std::cout << "operator " << typeid(T).name() << " called" << std::endl;
    sstrm.str(s);
    T t;
    
    if (!sstrm.good()) {
       std::cout << sstrm.fail() << sstrm.eof() << std::endl;
    }
    
    sstrm >> t;
    
    std::cout << "T is " << t << " and s is \"" << s << "\"" << std::endl;
    
    return t;
 }
 58
 /// stream operator
 friend std::ostream& operator<<(std::ostream& os , const STR& str) {
    os << str.s;
    return os;
 }
 64};

And it's use is pretty simple.

#SelectExpand
  1
  2int main(int argc , char** argv) {
 
 STR s1("5");
 STR s2("5.5");
 STR s3("5.55");
 STR s4("5.55555555555555555555555555");
 
 int x1 = s1;
 float y1 = s2;
 double z1 = s3;
 
 int x2 = s4;
 float y2 = s4;
 double z2 = s4;
 16
 std::string x = STR(x1);
 std::string y = STR(y1);
 std::string z = STR(z1);
 
 std::cout << "xyz is : " << x << " , " << y << " , " << z << std::endl;
 22
 std::cout << "x1y1z1 is : " << x1 << " , " << y1 << " , " << z1 << std::endl;
 24
 std::cout << "x2y2z2 s4 is : " << x2 << " , " << y2 << " , " << z2 << " , " << s4 << std::endl;
 
 return 0;
 28}

Output :

xyz is : 5 , 5.5 , 5.55
x1y1z1 is : 5 , 5.5 , 5.55
x2y2z2 s4 is : 5 , 5.55556 , 4551.11 , 5.55555555555555555555555555

Looks like stringstream has trouble reading a double with that much precision. I could use sscanf I suppose. Maybe that would be better.

l j

5.55555555555555555555555555 as a double should store 5.555555556.

C++11 actually added a lot of utilities for strings.

std::to_string for converting a primitive to a string. (You should be able to override it for custom types) and std::sto* where * is i, l, f... depending on what type you want to convert a string to a primitive type.

Edgar Reynaldo

That's not what you need though, what you need is sane string to primitive conversion. You can't overload std::string methods, which sucks. You can't override global assignment either.

There was a problem with the stream. I cleared the failbit and it was all good again, but there is still a lack of precision for the double.

New code :

#SelectExpand
  1#include <iostream>
  2#include <sstream>
  3#include <string>
  4#include <typeinfo>
  5
  6
  7
  8class STR {
 std::string s;
 std::stringstream sstrm;
 11public :
 12
 /// Constructors
 STR() : s("") , sstrm() {}
 template <class T>
 STR(T t) : s("") , sstrm() {
    sstrm.clear();
    sstrm << t;
    s = sstrm.str();
 }
 STR(const char* cstr) : s(cstr) , sstrm() {}
 STR(std::string str)  : s(str)  , sstrm() {}
 
 /// Operator =
 template <class T>
 STR& operator=(const T& t) {
    sstrm.clear();
    sstrm << t;
    s = sstrm.str();
    return *this;
 }
 STR& operator=(const char* cstr) {
    s = cstr;
    return *this;
 }
 STR& operator=(std::string str) {
    s = str;
    return *this;
 }
 
 /// Cast operator
 template<class T>
 operator T () {
    sstrm.clear();
    sstrm.str(s);
 46
    T t;
    sstrm >> t;
    return t;
 }
 operator std::string () {return s;}
 
 /// stream operator
 friend std::ostream& operator<<(std::ostream& os , const STR& str) {
    os << str.s;
    return os;
 }
 58};

New output :

xyz is : 5 , 5.5 , 5.55
x1y1z1 is : 5 , 5.5 , 5.55
x2y2z2 s4 is : 5 , 5.55556 , 5.55556 , 5.55555555555555555555555555

Chris Katko

This is all way easier in D.

It's just

//just kidding

Kitty Cat

Edgar Reynaldo said:

That's not what you need though, what you need is sane string to primitive conversion.

Define "sane".

std::string str("abc");
int a = str;

There is no sane way to do this. You either have to heavy-handedly throw an exception, or set some kind of default value without reporting any kind of error. It gets even worse if you have

std::string str("123abc");
int a = str;

Should this throw an exception? Silently fail and set a default int value? Set the int to 123 and silently ignore the remaining characters? How about:

std::string str("fffe");
int a = str;

Is this a failure, or does it set 0xfffe? Either way, it becomes a limiting factor since different use-cases will want different results.

The std::sto* functions handle this by having (optional) parameters that gives you the index of the first unconverted character and to specify the integer base. You can't have extra parameters with an assignment operator. Implicit conversions between fundamentally different types is generally bad mojo since it's not obvious what you get, suppresses potential errors, and can have unintended side effects when interacting with other parts of the language.

Edgar Reynaldo

std::sto* will throw an exception too. It's no better.

Why would I want to write this :

std::string str = "5";
unsigned int read = 0;
int x = std::stoi(str , &read);
if (read < 1) {
   /// Error
}

When I could do this :

int x = STR("5");
STR s = x;
int x2 = s;
assert(x == x2);

Throwing an exception is the right thing to do here. If you really need specialized number conversion, you could pass things like the base to the STR constructor. If the string doesn't represent a number, then throw. If there are extra characters ignore them. It should basically follow the behavior of sscanf in my opinion.

I did this for config files, and because I'm tired of converting and calling sscanf all the time.

Kitty Cat

Edgar Reynaldo said:

std::sto* will throw an exception too. It's no better.

Why would I want to write this :

std::string str = "5";
unsigned int read = 0;
int x = std::stoi(str , &read);
if (read < 1) {
   /// Error
}

When I could do this :

int x = STR("5");
STR s = x;
int x2 = s;
assert(x == x2);

As you say, std::sto* will throw an exception if no conversion could be performed, so you don't have to check if(read < 1) since a successful return means at least one character was converted. You might want to check if(read < s.length()), but that's the kind of information you'd want in a parser. And if you don't care about how much was read (thus if there's anything after the integer characters), you don't need to pass in a &read parameter. You can do:

int x = 5;
std::string s = std::to_string(x);
int x2 = std::stoi(s); // or int x2 = std::atoi(s.c_str()); for no exception from no conversion.
assert(x == x2);

Languages like Python and JavaScript keenly show the problems inherent to implicit conversions of fundamentally different types. It may feel liberating at first, but then you start finding the corner cases, getting utterly confused at the results you're seeing and wishing the compiler/linter would tell you something's fishy, but it can't because it's all valid.

bamccaig

This conversation made me think of two things. First, dynamic variables. Second, a scalar type such as in Perl that represents a string or number depending on the context (automatically converts between the two under the hood). Of course, neither fits into a static world where people are obsessed with machine cycles and memory, such as C or C++, but they're interesting ideas anyway to consider.

Dynamically-scoped variables are basically a form of "global" variable that can be temporarily changed, seen as changed for the duration of the nested code, and then automatically reverted back after. Perl achieves this with a local keyword. The idea also existed in Lisp long before.

#SelectExpand
  1# Perl 5.
  2use v5.022;
  3
  4local $x = 5;
  5
  6sub multiply {
  7    my $arg = shift;
  8
  9    return $x * $arg;
 10}
 11
 12sub nested1 {
 13    my $arg = shift;
 14
 15    # Until we leave this scope $x is now 10.
 16    local $x = 10;
 17
 18    return multiply($arg);
 19}
 20
 21say multiply(5);    # 5*5 = 25
 22say nested1(5);     # 10*5 = 50
 23say multiply(7);    # 5*7 = 35

The reason this could be useful is that you could change the context in which a simple assignment is executed. For example, you could say "anywhere within this scope, conversion errors are silently ignored", or "anywhere within this scope conversions must match the entire string instead of a prefix". You could even potentially define callbacks to handle specific types of cases at runtime.

#SelectExpand
  1int my_conversion_handler(
  2        const std::string & input,
  3        const std::conversion_error_type type,
  4        long index,
  5        const std::whatever & whatever)
  6{
  7    // Do stuff.
  8    // Return a fix, log the error to a file, email it to the developer, whatever.
  9    return 0;
 10}
 11
 12// ...
 13
 14{
 15    local std::conversion_error_handler = my_conversion_handler;
 16
 17    // Use it without affecting the rest of the program or having to do
 18    // error-prone try...catches to try to restore the state perfectly.
 19}

(Note: I don't really know C++11 or any of that newfangled shenanigans so I'm just making it up)

As for the scalar type as in Perl, there are machine-friendly types under the hood to store data, but it's basically a "variant" type that can store any number of types of values. How it is interpreted depends on the context. If you are doing mathematics then the value is converted to a number (0 if that fails), but if you're doing string concatenation then the value is converted to a string implicitly. It's all about the context of how you're using it, and there's no ambiguity from the operators (string concat is distinct from addition). JavaScript goes awry because there is ambiguity: "+" could mean addition or concatenation, and which depends on the operands. That's where the problems come from. Having distinct operations eliminates that source of errors. If userland cared to validate that scalar has a specific format/type you would likely test it with a regex in Perl (though there are probably already libraries out there for doing it more efficiently). On the other hand, often you don't care about the error, and defaulting to zero is sufficient for the program to keep working. Ultimately, validating input is a separate property of the program than storing or evaluating the data.

Neil Roy

In C...

char *str = "5";
int x = atoi(s);

Edgar Reynaldo

In C...

const char* str = 5;

Oh, right...

Polybios

Implicit string conversion is evil. >:(

bamccaig

Data is data is data. The type is just a tool to manipulate it. If the language or library gives you other tools to manipulate it then the type becomes less important. Obviously, there are performance benefits to having static types, but it comes at a development cost having to manage the types. It's all about give and take.

The day you can do the full array (no pun intended) of basic string operations in C and C++ without third party libraries, rolling your own, or stupid shenanigans will be the day that C and C++ programmers can smugly defend manual type conversions.

Edgar Reynaldo

Actually, on second thought, std::to_string and std::sto* aren't all that bad. It just annoys me that I can't use the same function. That's why overloading operator int () and such is so keen.

Neil Roy

I have never honestly had a reason to convert from a char* to an int while programming. Not sure of the circumstances where this would become needed to be honest.

Edgar Reynaldo said:

const char* str = 5;

const char *str = 5;

and

char *str = "5";

Are not the same thing. You will have different values. str = 5 will assign the number 5 to str; where as str = "5" will assign the value that represents "5" followed by a zero to end the string of char's to str.

Edgar Reynaldo

What, you've never used sscanf before?

I know they're not the same. I was illustrating the point that you can't auto convert a number to a string.

Chris Katko

I can. It's simple. All I do (and I do this in all my projects) is write my own LISP interpreter, which I then write an e-mail calendar app, which then sends an e-mail request to a javascript server with the text of the variable field, which gets sent through Google API and does a google search and returns the first result for "convert # to integer" which then downloads the stack overflow code and runs it on godbolt, and then returns the result, but instead of reading it, it then prints out a fax to a guy named Larry down the road with the same question and source code, and he picks up a phone--knocking his internet modem offline--and talks to a "some guy" who then enters the data into a SQL database that my simple app then returns the data.

Simple guys. Geez. Stop using old languages. ::)

Unfortunately, when I wrote all that, I wasn't experienced in generic programming. So I have a whole different workflow for floats that isn't anywhere as simple or elegant.

Edgar Reynaldo

Dude, a script kiddie could hack that. He could use a man in the middle attack to intercept your email, and I'm sure there's a JS exploit or two that could be used on that.

We're talking CS 101 here. Basic stuff. Cmon Chris. ::)

{"name":"Integer+to+String+Conversion.jpg","src":"\/\/djungxnpq2nug.cloudfront.net\/image\/cache\/6\/e\/6e301bfa0a9a863ca6a8c01177b34077.jpg","w":960,"h":720,"tn":"\/\/djungxnpq2nug.cloudfront.net\/image\/cache\/6\/e\/6e301bfa0a9a863ca6a8c01177b34077"}

bamccaig

char [] to int is needed if you read the number from a text stream or text field. Happens all day every day.

Samuel Henderson

bamccaig said:

Happens all day every day.

Sometimes even well into the night!

Neil Roy

Edgar Reynaldo said:

I know they're not the same. I was illustrating the point that you can't auto convert a number to a string.

Ahhhh, I see what you meant by that now, sorry. Not automatically, but...

sprintf(str, "%d", 5);

The above is one line of code, no more complicated that a single assignment line.

The reason it is not automatic is simple, the computer deals with numbers, a string of characters in a computer is a string of numbers, so there can't be any straight forward way to do it without your own functions. Functions like sprintf() (or better yet, snprintf()) are your best bet in C, or whatever string functions C++ provides.

Edgar Reynaldo

You forgot the other lines of code that go with it.

In C :

char* NewIntStr(int n) {
   const int BUFSIZE = 16;
   char* buf = (char*)malloc(sizeof(char)*(BUFSIZE+1));
   snprintf(buf , "%d" , n , BUFSIZE);
   return buf;
}

char* istr = NewIntStr(5);
free(istr);

In C++11

std::string five = std::to_string(5);

An awful lot of std::BS in there with the namespace, but compare the two. One example is 7 lines of code, the other is 1.

There is exactly one way to convert a whole number to a string. There's no reason that can't be built into the thing. And there could be easy floating point conversion as well.

there used to be itoa and atoi and the like

bamccaig

Neil Roy said:

That's a copout. It is automatic in plenty of languages/platforms. C can't really do it because it has no standardized, high-level string concept, and with just a pointer to a character C can't guarantee there's even enough space in the string to fit the number. If there was an array of characters the compiler could figure it out, but only in the declaring function. Most of the time we deal with pointers anyway because heap memory can travel more freely (no pun intended), and of course an array becomes a glorified pointer when passed around.

The C language could support a "string" type that does much more magically (i.e., by the compiler converting syntax to function calls). It just doesn't because it might not produce optimal code, and so instead everybody has to do everything the hard way all the time. It works for some projects, but not for all of them.

Neil Roy

C is a language designed to be close to machine language. It's what makes it fast and portable. Sure there are lots of higher level languages that can do this, AT A COST.

There's a reason why C is called a "low level language" and faster than the higher level languages. This is one of them. Certainly it's a trade off. Take your pick. I prefer C and these little things don't bother me. I rarely have to convert an int to a char*.

When I do, usually to display a score, I will have a temporary array I store it in, one I have set aside as a general purpose array just for this type of thing. Usually something like char text[4096];, and then I will use snprintf() to create it as I already shown (snprintf() to avoid overflow). It's very simple, and unlike what Edgar just shown, it's not that much code.

Actually, in my Deluxe Pacman 2 game I have the following...

#define TEXT_BUFFER        4096

Then later I have...

char buffer[TEXT_BUFFER];

And then I will use it, for example, when I tell the player to get ready at the start of the level...

snprintf(buffer, TEXT_BUFFER, "GET READY PLAYER %d", cplayer);

No malloc() or the other code Edgar implied I had to use, just this, nothing else as none of the text I need will ever exceed 4096 characters (or even come close really).

Simple, easy to use, fast. I swear people look for the most complicated solutions to problems where simple answers are available. But to each his own.

I always say, the best language to program in is the one you enjoy using. For me, it is C.

Edgar Reynaldo

Neil Roy said:

I rarely have to convert an int to a char*.

Well in my latest game I do it quite often. string to number to string, happens quite often when using my configuration file. I use printf quite often, but I also use stream operators.

For example, here is a printf style function that uses vsnprintf to return a string :

unsigned int STRINGPRINTF_BUFFER_SIZE = 1024;

   
string StringPrintF(const char* format_str , ...) {
   char buffer[STRINGPRINTF_BUFFER_SIZE];
   va_list args;
   va_start(args , format_str);
///int vsnprintf (char * s, size_t n, const char * format, va_list arg );
   vsnprintf(buffer , STRINGPRINTF_BUFFER_SIZE , format_str , args);
   va_end(args);
   return std::string(buffer);
}

Reasons it's better than snprintf?
1. No static buffer.
2. Thread safe
3. Returns std::string so memory management is automatic

And I've long been overloading stream operators in my classes. Take my EagleObject class for example - you can insert it into any kind of ostream you want :

   virtual std::ostream& DescribeTo(std::ostream& os , Indenter indent = Indenter()) const ;
};/// </EagleObject class>

std::ostream& operator<<(std::ostream& os , const EagleObject& obj);

std::ostream& EagleObject::DescribeTo(std::ostream& os , Indenter indent) const {
  return os << indent << FullName() << std::endl;
}

std::ostream& operator<<(std::ostream& os , const EagleObject& obj) {
   return obj.DescribeTo(os);
}

Now, any EagleObject object can pass itself to an ostream, and it will call the virtual DescribeTo method, which lets a class define how to stringify itself.

Widget w;
EagleLog() << w << std::endl;

EDIT
And because streams can have std::strings inserted, I can use both together like when I throw an exception :

throw EagleException(StringPrintF("Hello my name is %s %s. You killed my father, prepare to die!\n" , "Inigo" , "Montoya"));

Or when logging :

EagleError() << StringPrintF("Your application decided to crash. We're restarting Windows for you.") << std::endl;

bamccaig

Looks like you can't pass a std::string to StringPrintF though. :-/

Chris Katko

literally_anything.toString();

Because all objects inherit from the object class and have a toString method.

Also,

to!anything_else(almost_anything);

It makes (un)serializing really easy to a file for loading/saving maps.

Audric

I haven't seen it mentioned so far, but GNU's and BSD's libc include asprintf() : It's a sprintf() variant where you don't pass your own target buffer. It computes how long the string will be, performs a malloc(), fills it, and returns the address to you. It's then your responsibility to free() it when you're done with it.

Edgar Reynaldo

Is that just an GNU C extension? I doubt Micro$oft supports it. Otherwise I would make StringPrintF use it.

bamccaig

The implementations would be free software. Could include the definition with CPP.

Edgar Reynaldo

bamccaig said:

Looks like you can't pass a std::string to StringPrintF though.

Not currently. It might be possible with a bit of a hack, but you would have to find a non-reserved format specifier to use and be consistent with it.

There's a bit of a problem with StringPrintF though, and that is when you create a temporary string with it like so :

printf("%s\n" , StringPrintF("Hello my name is dangling pointer").c_str());

Thread #617384. Printed from Allegro.cc