I might be miss using deferred drawing here but when using it with OpenGL the cpu usage goes to about 50%, without deferred drawing it only uses a few percent. Works fine with directX.
I can't seem to reproduce here. This is oprofile output of your example with held drawing:
13268 60.7120 libnvidia-glcore.so.260.19.06 /usr/lib/nvidia-current/libnvidia-glcore.so.260.19.06 1082 4.9510 libGL.so.260.19.06 /usr/lib/nvidia-current/libGL.so.260.19.06 1054 4.8229 liballegro-debug.so.5.1.0 draw_quad 575 2.6311 ld-2.12.1.so __tls_get_addr 437 1.9996 liballegro-debug.so.5.1.0 tls_get 345 1.5787 liballegro-debug.so.5.1.0 _draw_tinted_rotated_scaled_bitmap_region 300 1.3727 liballegro-debug.so.5.1.0 al_compose_transform 259 1.1851 liballegro-debug.so.5.1.0 al_identity_transform 250 1.1440 libc-2.12.1.so memcpy 230 1.0524 liballegro-debug.so.5.1.0 al_rotate_transform 230 1.0524 liballegro-debug.so.5.1.0 al_transform_coordinates
This is without:
9394 59.3618 libnvidia-glcore.so.260.19.06 /usr/lib/nvidia-current/libnvidia-glcore.so.260.19.06 749 4.7330 libGL.so.260.19.06 /usr/lib/nvidia-current/libGL.so.260.19.06 458 2.8942 liballegro-debug.so.5.1.0 draw_quad 454 2.8689 libc-2.12.1.so fgetc 405 2.5592 libX11.so.6.3.0 /usr/lib/libX11.so.6.3.0 229 1.4471 ld-2.12.1.so __tls_get_addr 209 1.3207 libc-2.12.1.so __GI___strcmp_ssse3 169 1.0679 liballegro-debug.so.5.1.0 _draw_tinted_rotated_scaled_bitmap_region 160 1.0111 libc-2.12.1.so memcpy
One difference you can see is that with held drawing, we do transformations in software, so for each bitmap there's calls to the transformations functions - so they show up at the top. Without held drawing transformations are done on the GPU so they don't show up.
However I can't see this causing 50% CPU... in my case it makes no difference on the end result. It also would show up with D3D.
It would be interesting seeing profiling output but I don't think you can get that in Windows. But maybe there's something in allegro.log which can give a hint, so if you can try compiling it with the debug version of Allegro and attach the allegro.log file it might help find the problem.
Log attached.
Hm, nothing out of the ordinary. How many cores do you have? I assume 50% CPU means half of them are spin-locking. Maybe some threading issue with our wgl implementation. Not sure how to look into it without being able to reproduce.
Can someone else reproduce this?
Yes, the cpu has two cores.
EDIT: Neither core is completely saturated one is at about 75% and the other at about 25%.
That could just mean that the kernel is switching load between the cores.
I wonder if it has something to do with tls_get (which shows up as 2% in my profiling results with held drawing) - I don't see why the OpenGL and D3D drivers would call it different amounts of time though. Let's wait if someone else can reproduce it. I'll also try it on my netbook.