I'm guessing this is moot because of implementation details, but I was just wondering if there could be a use for a double precision version of al_transform_coordinates? I'm guessing allegro uses floats in its transformation matrices so there wouldn't be much point if that is true.
It might matter if there was a version that took double pointers instead of float pointers, because right now I have to declare two floats and perform assignment to get the data back into my double types. It's a data intensive operation in this case, so it might matter at least a little bit.
What are you working on that needs double instead of float?
I'm working on my Spiraloid program again, and I need super high precision angles for the spiral's theta value and theta offset, as well as rotation.
Something I'm doing now is using integer decimals and exponenents to keep the values the same and prevent precision loss when adding values, then I convert to doubles when I go to actually use the value. But I need high precision for the transformations that I'm applying. I suppose I have a matrix class lying around here somewhere that I could use...., but I really like Allegro's TRANSFORMs.
Edit
I have a list of spiral coordinates that need to be updated anytime the scale, offset, or rotation changes. The rotation changes fairly often, as the spiraloid may be spinning. There may be as many as (sqrt(1920^2 + 1200^2)/radial_delta)*(360/theta_delta) xy data points (for my laptop's specific resolution, but could be higher than that even) that need to be updated as often as once per monitor refresh. So it could be a lot of transformations, and I need to save the cpu as much as I can so it doesn't slow down the animation.
Ex, with a radial_delta of 1 and a theta_delta of 1 that is 815,000 data points running at 60 Hz gives about 2*2*50 million float to double assignments and 2*50 million transformation calculations per second, which is enough to stress the cpu.
Edit 2
Here's some 11x17 prints on the wall I made of some of my Spiraloid images today using the Color copier print service at Staples. Only about $15 bucks for 10 images, and the lady was nice enough to give me 10 free sheets of glossy photo paper to use. 
{"name":"610268","src":"\/\/djungxnpq2nug.cloudfront.net\/image\/cache\/9\/6\/960872a39afeb072ee8fc19c09dde637.png","w":800,"h":450,"tn":"\/\/djungxnpq2nug.cloudfront.net\/image\/cache\/9\/6\/960872a39afeb072ee8fc19c09dde637"}
I actually use my own version which uses double - float just doesn't really work at all above values of about >20,000 or you lose 1-pixel accuracy. And that's when not manipulating coordinates - longer chains of transformations basically don't work with float, period.
Even with double it's easy to hit accuracy problems when you're not careful about the order of operations.
So basically, I'd be for converting all floats in Allegro do double
Would that impact the FPU performance at all? Are floats significantly faster than doubles?
Are floats significantly faster than doubles?
Everyone on the web keeps giving out B.S. answers. I think we need to do an actual benchmark to get an answer to that.
The best I could find is this:
http://brandon.northcutt.net/article/double+VS+float+Speed+Comparison/20150625.html
In synthetic test, ever-so-slightly slower. In a "real world" test, it was twice as slow.
Of course, "twice is slow" is meaningless to a 4.0 GHZ server with 802,351 cores.
[edit] Someone linked this talk on a Reddit post:
I'm gonna watch it when I get back home. Supposedly it covers float/double performance.
Everyone on the web keeps giving out B.S. answers. I think we need to do an actual benchmark to get an answer to that.
The best I could find is this:
http://brandon.northcutt.net/article/double+VS+float+Speed+Comparison/20150625.html
The first thing I saw was -pg and gprof. That did not inspire confidence in me. gprof is hopelessly broken and no longer in development AFAIK (at least for MinGW).
I used a second method to evaluate a more "in the wild" performance and it yielded interesting results. For this method I compiled the program without the CPU profiling switch "-pg" and then made two binaries, one which ran only the float benchmark and one that ran only the double benchmark.
BASH COMMANDS
$ time ./float_bench
real 0m13.677s
user 0m13.665s
sys 0m0.012s
$ time ./double_bench
real 0m30.670s
user 0m21.427s
sys 0m9.243s
These results carry far more weight with me. But do they mean I should sacrifice the precision of doubles for the speed of floats? I don't know.
Something to note is that there were not any optimization flags passed to the compiler. It might be worth retesting the second method with optimizations enabled. I'm not on Linux so I can't use 'time' to measure it though, and I dont' know how to use high performance counters on Windows yet.
Edit
[edit] Someone linked this talk on a Reddit post:
I'm gonna watch it when I get back home. Supposedly it covers float/double performance.
I watched the slideshow, and it gave some juicy tidbits about new instruction sets like AVX and AVX2 and about how 'optimizations' on one architecture can be 'stalls' on another.
Double precision will be anywhere from slow to glacier on the GPU when compared to single precision; on CPUs (x86 at least), not so much.
GLM is a great math library. It's pretty much standalone, very portable, and has support for most everything you'd need for rendering. And it supports single and double precision matrices (and vectors, and so on).
Since Allegro's transforms are geared towards GPUs, or so I think, single precision is probably best.
I modified Brandon's benchmarking program (the second method) slightly and fixed a minor bug (he was initializing a float array with 0.0 ((not 0.0f))) and then compiled it with different optimization levels and ran the tests with 1000 calls.
Zip file of code and batch scripts :
BenchmarksAndProfiling.zip
Here are the results :
Here's the code I used :
As expected, -O0 took the longest. -O1, -O2, and -O3 were all comparable. Memory allocation and deallocation generally took twice the time for doubles as it did for floats (because they are twice as big). Deallocation times were constant across optimizations. Something important to note is that I used volatile for the memory allocation size so it couldn't be optimized away.
I used al_get_time for measurements. Allocation and deallocation can be quite costly, and should be avoided if possible. The math times are comparable on my laptop with any optimization other than -O0 (Intel i7-5700HQ @ 2.70 GHz).
I'm running Windows 10 64 bit and I wanted to test with -m64 architecture but mingw32 doesn't support it. 
Edit
TL;DR;
Here's table of the results including the allocations :
-O0 float : 43.70ms per op = 22.88FPS -O0 double : 54.04ms per op = 18.50FPS -O1 float : 25.87ms per op = 38.65FPS -O1 double : 33.06ms per op = 30.25FPS -O2 float : 24.68ms per op = 40.52FPS -O2 double : 32.02ms per op = 31.23FPS -O3 float : 24.46ms per op = 40.88FPS -O3 double : 32.33ms per op = 30.93FPS
And a table of the results for just the computations :
-O0 float : 23.59ms per op = 42.39FPS -O0 double : 31.38ms per op = 31.87FPS -O1 float : 17.44ms per op = 57.34FPS -O1 double : 17.55ms per op = 56.98FPS -O2 float : 16.70ms per op = 59.88FPS -O2 double : 16.73ms per op = 59.77FPS -O3 float : 16.58ms per op = 60.31FPS -O3 double : 16.78ms per op = 59.59FPS
So you can see that if you wanted to process 6220800 (1920x1200x3) floating point elements per second on my laptop's cpu it would just barely keep up with a 60HZ refresh rate with optimizations enabled. But the difference between single precision floating point math and double precision floating point math is almost negligible.
Since Allegro's transforms are geared towards GPUs, or so I think, single precision is probably best.
OpenGL also supports half-precision floats and integer coordinate systems. I don't see any clear reason why Allegro shouldn't support them.
The Gamecube runs with integer math.
Now that OpenGL supports it, the Dolphin emulator was ported to integer math and tons of bugs have gone away.
https://dolphin-emu.org/blog/2014/03/15/pixel-processing-problems/
[edit]
ALSO, I had no idea there was a different between 0.0 and 0.0f / 0.0. There's REALLY such a thing as a float vs double literal, and the compiler will silently convert them if you have the wrong one. ... I think?
This is insanity!
Bringing back to another of my threads: Somehow, a std::string implicitly converting to a c_string is terrible, but doubles to floats, and floats to ints are OKAY being implicit?! COME ON C++. COME ON.
See my last edit for FPS results of ops with and without allocations included.
ALLEGRO_TRANSFORM indeed has floats inside it, and since its internals are public, we're kind of stuck with it that way. It is that way primarily because that's what is supported across platforms (the culprit in this case is Direct3D).
OpenGL also supports half-precision floats and integer coordinate systems. I don't see any clear reason why Allegro shouldn't support them.
If I remember correctly, half precision is only useful on mobile platforms. It's a no-op on most desktop GPUs. Similarly, native integer support is slow, like doubles.
But most of all, such features are useless for anyone using Allegro for rendering.
The Gamecube runs with integer math.
The classic Xbox had a bizarre programmable GPU unlike otherwise equivalent Nvidia chips before and after. The SNES had a terribly weak CPU, only a minor step up from the NES. The Nintendo 64 was pretty much a SGI workstation. The Wii has a small ARM processor on the same die as the GPU that controls various security and I/O processes.
Consoles used to have strange quirks unlike PCs, and that was nice, but that doesn't have any relevance to modern hardware.
It would be possible to create a function called al_transform_coordinates_d that took double pointers though. That would at least save the allocation of two floats. But I guess if they're on the stack it wouldn't matter, even in a heavy loop. Don't mind me. Just thinking out loud.
My only concern is this part of my code :
GeneratePlotData only gets called if the theta_delta or the radial_delta change, as that affects the number of data points in the spiral. But the transform and the modified coordinates change every time the rotation changes, which is quite often in my program.