*Title should say CPU Usage
For my game, I want to use GLSL shaders so I explicitly set the ALLEGRO_OPENGL flag when creating a display. However, in Windows, it causes the cpu usage to fluctuate quite a bit compared to Direct3D. This is consistent with anything I compile with Allegro. Is this normal, is it actually consuming what it says?
This happens on my laptop which has an NVidia GeFoce 9400m video card. It also fluctuates on my PC too though, and both have latest drivers.
Is it because there is a substantial advantage to using Direct3D on Windows? If so, would CG be a better alternative?
Thanks
Can you give me an example I can compile to test it? I have the same 9400M in my laptop. It really should not be doing that, if it is there might be a bug somewhere. One thing that might be an issue is reloading the matrices all the time. I noticed that using PROGRAMMBLE_PIPELINE is a bit slower running than those that don't use shaders (on iphone). My first guess in that case was transferring matrices into shaders. Anyway, I will test an example if you have one.
I've attached an exe I made which uses a shader. Although it happens even when shaders are not used. It goes anywhere from 5 to 50% on mine. But what it's doing is rather simple. Without shaders in D3D this runs consistently around 3%.
The SRC is a test bed and a bit of a mess but it might help:
Thanks for testing.
Can you include all the source that's needed to compile it? Have you tried profiling it to see where most time is spent? It seems to take around 100% cpu most of the time here.
Here is the source code for my SKALE API thus far, it is meant to be compiled with MSVC. Thanks
Sorry, don't have an Allegro+MSVC environment set up. Will it compile with MinGW?
It uses std::exception with a constructor which is non standard, (for now) and you need to add an include directory of the solution dir since I use "SKALE/blah..." Other than that it should build in g++ / mingw
I get a whole bunch of errors:
In file included from Animation.hpp:4,
from Animation.cpp:1:
../SKALE/KeyFrame.hpp:7: error: ‘size_t’ does not name a type
../SKALE/KeyFrame.hpp:9: error: ‘size_t’ has not been declared
../SKALE/KeyFrame.hpp:12: error: ISO C++ forbids declaration of ‘size_t’ with no type
../SKALE/KeyFrame.hpp:12: error: expected ‘;’ before ‘&’ token
In file included from Bone.cpp:1:
../SKALE/Bone.hpp:19: error: field ‘mName’ has incomplete type
../SKALE/Bone.hpp:42: error: default argument for parameter of type ‘const std::string&’ has type ‘const char [1]’
Bone.cpp: In constructor ‘skl::Bone::Bone(float, float, float, float, float, float, bool, const std::string&, skl::Bone*)’:
Bone.cpp:15: error: class ‘skl::Bone’ does not have any field named ‘mName’
Bone.cpp: In member function ‘const std::string& skl::Bone::getName() const’:
Bone.cpp:149: error: ‘mName’ was not declared in this scope
Bone.cpp: In member function ‘void skl::Bone::setName(const std::string&)’:
Bone.cpp:186: error: ‘mName’ was not declared in this scope
/usr/include/c++/4.2.1/bits/stl_algo.h: In function ‘const _Tp& std::__median(const _Tp&, const _Tp&, const _Tp&) [with _Tp = KeyFrame]’:
/usr/include/c++/4.2.1/bits/stl_algo.h:2758: instantiated from ‘void std::__introsort_loop(_RandomAccessIterator, _RandomAccessIterator, _Size) [with _RandomAccessIterator = __gnu_cxx::__normal_iterator<KeyFrame*, std::vector<KeyFrame, std::allocator<KeyFrame> > >, _Size = long int]’
/usr/include/c++/4.2.1/bits/stl_algo.h:2829: instantiated from ‘void std::sort(_RandomAccessIterator, _RandomAccessIterator) [with _RandomAccessIterator = __gnu_cxx::__normal_iterator<KeyFrame*, std::vector<KeyFrame, std::allocator<KeyFrame> > >]’
Bone.cpp:200: instantiated from here
/usr/include/c++/4.2.1/bits/stl_algo.h:91: error: passing ‘const KeyFrame’ as ‘this’ argument of ‘bool KeyFrame::operator<(const KeyFrame&)’ discards qualifiers
/usr/include/c++/4.2.1/bits/stl_algo.h:92: error: passing ‘const KeyFrame’ as ‘this’ argument of ‘bool KeyFrame::operator<(const KeyFrame&)’ discards qualifiers
/usr/include/c++/4.2.1/bits/stl_algo.h:94: error: passing ‘const KeyFrame’ as ‘this’ argument of ‘bool KeyFrame::operator<(const KeyFrame&)’ discards qualifiers
/usr/include/c++/4.2.1/bits/stl_algo.h:98: error: passing ‘const KeyFrame’ as ‘this’ argument of ‘bool KeyFrame::operator<(const KeyFrame&)’ discards qualifiers
/usr/include/c++/4.2.1/bits/stl_algo.h:100: error: passing ‘const KeyFrame’ as ‘this’ argument of ‘bool KeyFrame::operator<(const KeyFrame&)’ discards qualifiers
In file included from ../SKALE/IKSolver.hpp:4,
from IKSolver.cpp:1:
../SKALE/Bone.hpp:19: error: field ‘mName’ has incomplete type
../SKALE/Bone.hpp:42: error: default argument for parameter of type ‘const std::string&’ has type ‘const char [1]’
In file included from KeyFrame.cpp:1:
KeyFrame.hpp:7: error: ‘size_t’ does not name a type
KeyFrame.hpp:9: error: ‘size_t’ has not been declared
KeyFrame.hpp:12: error: ISO C++ forbids declaration of ‘size_t’ with no type
KeyFrame.hpp:12: error: expected ‘;’ before ‘&’ token
KeyFrame.cpp:4: error: ‘size_t’ has not been declared
KeyFrame.cpp: In constructor ‘KeyFrame::KeyFrame(float, int)’:
KeyFrame.cpp:5: error: class ‘KeyFrame’ does not have any field named ‘mFrame’
KeyFrame.cpp: At global scope:
KeyFrame.cpp:20: error: expected initializer before ‘&’ token
KeyFrame.cpp: In member function ‘bool KeyFrame::operator<(const KeyFrame&)’:
KeyFrame.cpp:27: error: ‘getFrame’ was not declared in this scope
KeyFrame.cpp:27: error: ‘const class KeyFrame’ has no member named ‘getFrame’
In file included from Skeleton.hpp:4,
from Skeleton.cpp:1:
../SKALE/Bone.hpp:19: error: field ‘mName’ has incomplete type
../SKALE/Bone.hpp:42: error: default argument for parameter of type ‘const std::string&’ has type ‘const char [1]’
/usr/include/c++/4.2.1/bits/ios_base.h: In copy constructor ‘std::basic_ios<char, std::char_traits<char> >::basic_ios(const std::basic_ios<char, std::char_traits<char> >&)’:
/usr/include/c++/4.2.1/bits/ios_base.h:779: error: ‘std::ios_base::ios_base(const std::ios_base&)’ is private
/usr/include/c++/4.2.1/iosfwd:55: error: within this context
/usr/include/c++/4.2.1/iosfwd: In copy constructor ‘std::basic_ofstream<char, std::char_traits<char> >::basic_ofstream(const std::basic_ofstream<char, std::char_traits<char> >&)’:
/usr/include/c++/4.2.1/iosfwd:92: note: synthesized method ‘std::basic_ios<char, std::char_traits<char> >::basic_ios(const std::basic_ios<char, std::char_traits<char> >&)’ first required here
/usr/include/c++/4.2.1/streambuf: In copy constructor ‘std::basic_filebuf<char, std::char_traits<char> >::basic_filebuf(const std::basic_filebuf<char, std::char_traits<char> >&)’:
/usr/include/c++/4.2.1/streambuf:794: error: ‘std::basic_streambuf<_CharT, _Traits>::basic_streambuf(const std::basic_streambuf<_CharT, _Traits>&) [with _CharT = char, _Traits = std::char_traits<char>]’ is private
/usr/include/c++/4.2.1/iosfwd:86: error: within this context
/usr/include/c++/4.2.1/iosfwd: In copy constructor ‘std::basic_ofstream<char, std::char_traits<char> >::basic_ofstream(const std::basic_ofstream<char, std::char_traits<char> >&)’:
/usr/include/c++/4.2.1/iosfwd:92: note: synthesized method ‘std::basic_filebuf<char, std::char_traits<char> >::basic_filebuf(const std::basic_filebuf<char, std::char_traits<char> >&)’ first required here
Skeleton.cpp: In member function ‘bool skl::Skeleton::save(const std::string&) const’:
Skeleton.cpp:138: note: synthesized method ‘std::basic_ofstream<char, std::char_traits<char> >::basic_ofstream(const std::basic_ofstream<char, std::char_traits<char> >&)’ first required here
Skeleton.cpp: In member function ‘bool skl::Skeleton::_sortLinesByLevel(std::vector<std::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::basic_string<char, std::char_traits<char>, std::allocator<char> > > >&, std::vector<std::pair<int, std::string*>, std::allocator<std::pair<int, std::string*> > >&)’:
Skeleton.cpp:345: warning: NULL used in arithmetic
No time to dig into it right now. Can you at least build me a d3d version of the exe so I can see if it's any better?
Here is the D3D version (no shader)
Thanks
Well that's a VERY apples and oranges test. The GL version has a bunch of shaders and the d3d version is just drawing a man shaped rope. I can't come to any conclusion with just that. I'd say port your shaders to HLSL and try it, it's not much different from GLSL. Or use Cg.
Well right now, I just compiled ex_bitmap with D3D, and one with OpenGL and the OpenGL version clearly takes much more cpu and fluctuates in cpu usage. I checked in Process Explorer and there is a clear fluctuation in the GL version. In a game like M.A.R.S (http://mars-game.sourceforge.net/) (Which uses pure OpenGL and shaders) on my laptop, there is no fluctuation.
Also profiling all my executables reveals that 95% of the time is spent in al_wait_for_event()
I can't reproduce it here with ex_bitmap. It's the same, 0-2% with both OpenGL and Direct3D. I tried with Aero on and off and it's the same. What type of percentages are you seeing for ex_bitmap?
With D3D:
1, 2, 1, 1, 1, 2, 1, 2, 1, 1, 1
With al_set_new_display_flags(ALLEGRO_OPENGL):
5, 6, 5, 6, 51, 39, 23, 16, 5, 6, 27, 39, 17, 6, 5
Have you checked for a driver update for your card recently?
Those are pretty crazy numbers. Do you have some strange OpenGL settings in your gfx card settings panel?
I updated to the latest ones this morning. I should note though, that on ex_bitmap I did bring the frame rate from 30 to 60, this seems to make a difference.
Actually, on my game, lowering the frame rate to 30 frames per second fixes the problem. I'm confused now.
So at 60 frames per second, GL fluctuates like mad, but at 30 it is totally stable.
Any background crap running? Including windows updates and crap?
Nope, I tried rebooting, killing several processes, this behavior is consistent. Changing the timer from 1.0f / 60.0f to 1.0f / 30.0f makes it stable. All of them are like this.
What happens if you, say, bump it to 100fps, keep record of fps. After 10 seconds is it running @ full 100fps?
it causes the cpu usage to fluctuate quite a bit compared to Direct3D.
It occurs to me that you might be seeing some type of aliasing (google Nyquist frequency and moire patterns). If you can tweak either the loop speed of your program or the update period of Process Explorer.
[EDIT]
I was going to post this an hour ago but canceled, the reply box offered to put it back.
When I set it to 100 fps, I get around 62fps and 100% (both cores!) cpu usage. When I lower to < 60, I get that desired frame rate.
In D3D I get my 100fps no problem.
On my PC with a GTA 275 it still uses 100% and only 60 fps. (supposed to be getting 100 fps)
I attached it if you want to try.
Can you see if vsync is force enabled in your opengl settings and try to turn it off?
It was not force enabled, it said it was going based on the 3D application settings, but setting it to force off solved the problem. Still though, why didn't other GL games give me this problem?
Oh and thanks again to everyone who helped me get to the bottom of this!
I'm not sure. Did you mention what version of Windows you're using? I'm on Windows 7 64 bit. I have the same gpu and even forcing vsync on I don't see any fluctuation.
Windows 7 32 bit on Laptop
Driver Version is 275.33
on PC it is 64 bit with a GTX 275 ad has an i7 @ 3.0 GHz.
Both machines have the latest drivers.
If I force on for either of them, the problem comes right back.
Edit: Now the M.A.R.S game has the complete opposite behavior. When I force off, it shoots to 50%, when I force on, it is normal around 15%.
The latter behaviour (of M.A.R.S) makes sense. The Allegro behaviour doesn't. I have this fuzzy memory of someone talking about vsync using up mad CPU but I don't remember who it was or all the details. But it's not happening with M.A.R.S so that makes it a little strange... It's possible the Windows code does something strange with Vsync, I can have a look at it but I've never messed around with WGL.
EDIT: The WGL display driver has no mention of "vsync" or even "sync" anywhere. I'll look into it tonight if I can. But maybe someone else has a better idea already how vsync works on WGL?
This could be useful:
http://stackoverflow.com/questions/589064/how-to-enable-vertical-sync-in-opengl
If you are working on Windows you have to use extensions to use wglSwapIntervalExt function. It is defined in wglext.h. You will also want to download glext.h file. In wglext file all entry points for Windows specific extensions are declared. All such functions start with prefix wgl. To get more info about all published extensions you can look into OpenGL Extension Registry.
wglSwapIntervalEXT is from WGL_EXT_swap_control extension. It lets you specify minimum number of frames before each buffer swap. Usually it is used for vertical synchronization (if you set swap interval to 1). More info about whole extension can be found here. Before using this function you need query whether you card has support for WGL_EXT_swap_control and then obtain pointer to the function using wglGetProcAddress function.
Thanks. Should be very easy to enable/disable vsync with this function. Allegro's extension handling already loads wglSwapIntervalEXT if it's supported, so it's just a matter of calling it if it's available (Allegro can check if extensions are available very easily too) with 1 or 0 depending on if the user requests it to be off or on (I think on should be the default). Don't know if it would make a difference in your case but it's worth a try.
Nice!!!
Calling wglSwapIntervalEXT(0); after creating the display totally solves the problem, so I think Allegro is missing this and I'm sure committing this change will solve this issue!
Well it's more complicated than that, but I just committed a patch to 5.1 that makes WGL honor the ALLEGRO_VSYNC display option. So it should work for you to force it off or on.
Great thanks!
I was also wondering if it was normal that I can render a lot more bitmaps with D3D. I made a test and rendering 20x20 bitmaps in OpenGL consumed around 60% while it was only about 12% in D3D.
On some hardware that is normal. Can you give me another simple compilable example, or does ex_draw_bitmap do the same thing, and I can test that?
If you add a for loop of 400 it should do it.
D3D: 5%
GL: 30%
But once again, the big differences come when this example runs at 60 fps, I also added the swapinternal function.