convert pixel formats

Mark Oates

basically, I'm trying to figure out how to finish (do it correctly) this function :

inline uint32_t make_argb_8888(ALLEGRO_COLOR &col)
{
  unsigned char r = (int)(255 * col.r);
  unsigned char g = (int)(255 * col.g);
  unsigned char b = (int)(255 * col.b);
  unsigned char a = (int)(255 * col.a);

  // uh....
}

Something makes me think I should do
return (0x00ff0000 * r) + (0x0000ff00 * g) + (0x000000ff * b) + (0xff000000 * a);
and that's my best guess, but I don't know why.

Matthew Leverton

(a << 24) | (r << 16) | (g << 8) | b

Arthur Kalliokoski

If you had these four chars as a struct in a union with an int, you could skip the addition and multiplication in the return value by using the address offsets as a "multiplication" substitute.. IIRC that's the way A4 did it.

You might look into SSE if you don't care about portability.

Mark Oates

Fantastic! Thanks Matthew.

My result:

unsigned char r, g, b, a;

inline uint32_t make_argb_8888(ALLEGRO_COLOR &col)
{
  al_unmap_rgba(col, &r, &g, &b, &a);
  return (a << 24) | (r << 16) | (g << 8) | b;
}

Would I get any real benefits using a union? It seems like it would just involve more shuffling and processor time.

Arthur Kalliokoski

I tried timing them with clock(), but they seemed to run about equal. I didn't check that the logic is correct, but as long as I don't overrun arrays the times should be about what the "right" way would take.

click paperclip to get the test proglet.

[EDIT]

I had a brain fart blowing out the cache, naturally they'd be the same, but no matter what I do they run about equal (or at least bounce around as to which is faster). See t_v2.c in paperclip.

In short, Matthew's version would be better for conciseness.

ImLeftFooted

You should declare r, g, b, and a inside the function.

Arthur Kalliokoski

I thought I did Unless you mean the global arrays of floats and ints, simulating buffers in memory.

Mark Oates

Quote:

I thought I did

He's talkin to me.

Wouldn't that slow down the function?

Arthur Kalliokoski

The idea of locals on the stack being faster is due to the cache line already being loaded due to saving the return address. I'm not sure how an inlined function would work though, maybe I should check.

ImLeftFooted

It should speed it up.

Stuff inside a function goes into the cache, which is faster to play with than regular memory.

Also a smart compiler could turn them into registers -- which would be even faster.

Mark Oates

K, I did some benchmarks on the pixel routines.

Thread #606139. Printed from Allegro.cc