Allegro.cc - Online Community

Allegro.cc Forums » Programming Questions » DevCpp Learning to profile with

This thread is locked; no one can reply to it. rss feed Print
DevCpp Learning to profile with
Paul whoknows
Member #5,081
September 2004
avatar

I am using devcpp since a few days ago.
I have 2 functions for drawing anti-aliased sprites to a memory buffer, one is allegro's draw_trans_sprite(), and the other one is Fladimir's AlphaBlend32().
I want to know how much faster the fastest one is.

I am linking to liballp.a, and the compiler option -pg.
Attached is the source file with an alpha sprite for testing.

Here is the loop, I am calling both functions at the same time, I think that's the best way for testing, any advices?

1int main() {
2 /* Setup Allegro */
3 allegro_init();
4 install_keyboard();
5 install_mouse();
6 
7 /*set color depth in 32bpp */
8 set_color_depth( 32 );
9 set_gfx_mode( GFX_AUTODETECT_WINDOWED, 800, 600, 0, 0 );
10 //set_gfx_mode( GFX_AUTODETECT, 800, 600, 0, 0 );
11
12 BITMAP *buffer = create_bitmap( SCREEN_W, SCREEN_H );
13 clear_to_color(buffer, makecol(255, 0, 0));
14
15 /* must be a 32-bit bitmap with an alpha channel! */
16 BITMAP *sprite = load_bitmap( "olafito.tga", NULL );
17 if(sprite == NULL ) allegro_message("not found\n");
18
19set_alpha_blender();
20 
21do {
22 
23
24 /* draw with AlphaBlend32*/
25 AlphaBlend32( sprite, buffer, mouse_x-sprite->w/2, mouse_y-sprite->h/2, 255 );
26
27 /* draw with draw_trans_sprite */
28 draw_trans_sprite(buffer, sprite, mouse_x-sprite->w/2, mouse_y-sprite->h/2);
29
30
31 blit( buffer, screen, 0, 0, 0, 0, SCREEN_W, SCREEN_H );
32 
33 
34}while (!key[KEY_ESC]);
35 return 0;
36}

.
.

____

"The unlimited potential has been replaced by the concrete reality of what I programmed today." - Jordan Mechner.

HoHo
Member #4,534
April 2004
avatar

Linking with profiling versions of Allegro is only meaningful if you want to profile Allegro itself. If you want to profile your program then compile it with -pg and link with regular allegro. That way allegro won't be slowed down by being profiled.

In case of your program I'd just have two loops that repeat one or the other function for million times. The one who takes less time wins. Profilers won't be much of use to you, unless you intend to improve implementations of these functions.

__________
In theory, there is no difference between theory and practice. But, in practice, there is - Jan L.A. van de Snepscheut
MMORPG's...Many Men Online Role Playing Girls - Radagar
"Is Java REALLY slower? Does STL really bloat your exes? Find out with your friendly host, HoHo, and his benchmarking machine!" - Jakub Wasilewski

Paul whoknows
Member #5,081
September 2004
avatar

Quote:

Linking with profiling versions of Allegro is only meaningful if you want to profile Allegro itself. If you want to profile your program then compile it with -pg and link with regular allegro. That way allegro won't be slowed down by being profiled.

I did not know that, in MSVC6 I was using the debug and profile version of allegro for debuging and profiling my program, that was not really necesary?

Back on topic, I changed my loop code following your advice.
I tested the program with draw_trans_sprite and AlphaBlend32, one first, and the other later.

#SelectExpand
1set_alpha_blender(); 2int counter=0; 3do { 4 5 6 /* draw with AlphaBlend32*/ 7 AlphaBlend32( sprite, buffer, mouse_x-sprite->w/2, mouse_y-sprite->h/2, 255 ); 8 9 /* draw with draw_trans_sprite */ 10 // draw_trans_sprite(buffer, sprite, mouse_x-sprite->w/2, mouse_y-sprite->h/2); 11 12 13 blit( buffer, screen, 0, 0, 0, 0, SCREEN_W, SCREEN_H ); 14 counter++; 15 16}while (!key[KEY_ESC] && counter<10000); 17 return 0; 18}

.
.
This is what I obtained for AlphaBlend32:
.
.

#SelectExpand
1Each sample counts as 0.01 seconds. 2 % cumulative self self total 3 time seconds seconds calls ms/call ms/call name 4 54.97 8.46 8.46 10000 0.85 1.54 AlphaBlend32(BITMAP*, BITMAP*, int, int, int) 5 28.20 12.80 4.34 202430000 0.00 0.00 BlendColors32NoEmms(unsigned long, unsigned long, unsigned long) 6 16.50 15.34 2.54 768000000 0.00 0.00 geta32 7 0.19 15.37 0.03 2400000 0.00 0.00 bmp_write_line 8 0.13 15.39 0.02 _mangled_main() 9 0.00 15.39 0.00 10000 0.00 0.00 CallEmms() 10 0.00 15.39 0.00 10000 0.00 0.00 acquire_bitmap 11 0.00 15.39 0.00 10000 0.00 0.00 bmp_unwrite_line 12 0.00 15.39 0.00 10000 0.00 0.00 release_bitmap 13 0.00 15.39 0.00 1 0.00 0.00 WinMain@16 14 0.00 15.39 0.00 1 0.00 0.00 clear_to_color 15 16 % the percentage of the total running time of the 17time program used by this function. 18 19cumulative a running sum of the number of seconds accounted 20 seconds for by this function and those listed above it. 21 22 self the number of seconds accounted for by this 23seconds function alone. This is the major sort for this 24 listing. 25 26calls the number of times this function was invoked, if 27 this function is profiled, else blank. 28 29 self the average number of milliseconds spent in this 30ms/call function per call, if this function is profiled, 31 else blank. 32 33 total the average number of milliseconds spent in this 34ms/call function and its descendents per call, if this 35 function is profiled, else blank. 36 37name the name of the function. This is the minor sort 38 for this listing. The index shows the location of 39 the function in the gprof listing. If the index is 40 in parenthesis it shows where it would appear in 41 the gprof listing if it were to be printed. 42 43 Call graph (explanation follows) 44 45 46granularity: each sample hit covers 4 byte(s) for 0.06% of 15.39 seconds 47 48index % time self children called name 49 <spontaneous> 50[1] 100.0 0.02 15.37 _mangled_main() [1] 51 8.46 6.91 10000/10000 AlphaBlend32(BITMAP*, BITMAP*, int, int, int) [2] 52 0.00 0.00 1/1 clear_to_color [13] 53----------------------------------------------- 54 8.46 6.91 10000/10000 _mangled_main() [1] 55[2] 99.9 8.46 6.91 10000 AlphaBlend32(BITMAP*, BITMAP*, int, int, int) [2] 56 4.34 0.00 202430000/202430000 BlendColors32NoEmms(unsigned long, unsigned long, unsigned long) [3] 57 2.54 0.00 768000000/768000000 geta32 [4] 58 0.03 0.00 2400000/2400000 bmp_write_line [5] 59 0.00 0.00 10000/10000 acquire_bitmap [9] 60 0.00 0.00 10000/10000 CallEmms() [8] 61 0.00 0.00 10000/10000 bmp_unwrite_line [10] 62 0.00 0.00 10000/10000 release_bitmap [11] 63----------------------------------------------- 64 4.34 0.00 202430000/202430000 AlphaBlend32(BITMAP*, BITMAP*, int, int, int) [2] 65[3] 28.2 4.34 0.00 202430000 BlendColors32NoEmms(unsigned long, unsigned long, unsigned long) [3] 66----------------------------------------------- 67 2.54 0.00 768000000/768000000 AlphaBlend32(BITMAP*, BITMAP*, int, int, int) [2] 68[4] 16.5 2.54 0.00 768000000 geta32 [4] 69----------------------------------------------- 70 0.03 0.00 2400000/2400000 AlphaBlend32(BITMAP*, BITMAP*, int, int, int) [2] 71[5] 0.2 0.03 0.00 2400000 bmp_write_line [5] 72----------------------------------------------- 73 0.00 0.00 10000/10000 AlphaBlend32(BITMAP*, BITMAP*, int, int, int) [2] 74[8] 0.0 0.00 0.00 10000 CallEmms() [8] 75----------------------------------------------- 76 0.00 0.00 10000/10000 AlphaBlend32(BITMAP*, BITMAP*, int, int, int) [2] 77[9] 0.0 0.00 0.00 10000 acquire_bitmap [9] 78----------------------------------------------- 79 0.00 0.00 10000/10000 AlphaBlend32(BITMAP*, BITMAP*, int, int, int) [2] 80[10] 0.0 0.00 0.00 10000 bmp_unwrite_line [10] 81----------------------------------------------- 82 0.00 0.00 10000/10000 AlphaBlend32(BITMAP*, BITMAP*, int, int, int) [2] 83[11] 0.0 0.00 0.00 10000 release_bitmap [11] 84----------------------------------------------- 85 0.00 0.00 1/1 main [76] 86[12] 0.0 0.00 0.00 1 WinMain@16 [12] 87----------------------------------------------- 88 0.00 0.00 1/1 _mangled_main() [1] 89[13] 0.0 0.00 0.00 1 clear_to_color [13] 90-----------------------------------------------

.
.
This is what I obtained for draw_trans_sprite
.
.

#SelectExpand
1Flat profile: 2 3Each sample counts as 0.01 seconds. 4 % cumulative self self total 5 time seconds seconds calls Ts/call Ts/call name 6100.00 0.01 0.01 _mangled_main() 7 0.00 0.01 0.00 10000 0.00 0.00 draw_trans_sprite 8 0.00 0.01 0.00 1 0.00 0.00 WinMain@16 9 0.00 0.01 0.00 1 0.00 0.00 clear_to_color 10 11 % the percentage of the total running time of the 12time program used by this function. 13 14cumulative a running sum of the number of seconds accounted 15 seconds for by this function and those listed above it. 16 17 self the number of seconds accounted for by this 18seconds function alone. This is the major sort for this 19 listing. 20 21calls the number of times this function was invoked, if 22 this function is profiled, else blank. 23 24 self the average number of milliseconds spent in this 25ms/call function per call, if this function is profiled, 26 else blank. 27 28 total the average number of milliseconds spent in this 29ms/call function and its descendents per call, if this 30 function is profiled, else blank. 31 32name the name of the function. This is the minor sort 33 for this listing. The index shows the location of 34 the function in the gprof listing. If the index is 35 in parenthesis it shows where it would appear in 36 the gprof listing if it were to be printed. 37 38 Call graph (explanation follows) 39 40 41granularity: each sample hit covers 4 byte(s) for 100.00% of 0.01 seconds 42 43index % time self children called name 44 <spontaneous> 45[1] 100.0 0.01 0.00 _mangled_main() [1] 46 0.00 0.00 10000/10000 draw_trans_sprite [4] 47 0.00 0.00 1/1 clear_to_color [6] 48----------------------------------------------- 49 0.00 0.00 10000/10000 _mangled_main() [1] 50[4] 0.0 0.00 0.00 10000 draw_trans_sprite [4] 51----------------------------------------------- 52 0.00 0.00 1/1 main [76] 53[5] 0.0 0.00 0.00 1 WinMain@16 [5] 54----------------------------------------------- 55 0.00 0.00 1/1 _mangled_main() [1] 56[6] 0.0 0.00 0.00 1 clear_to_color [6] 57-----------------------------------------------

.
.
What does that mean? I can't even read it!>:(

____

"The unlimited potential has been replaced by the concrete reality of what I programmed today." - Jordan Mechner.

HoHo
Member #4,534
April 2004
avatar

As I said before, profiling is not the best way to find out what is faster, at least not in your case. Problem is, profiling code adds a lot of overhead and it won't give you all that accurate results. Also comparing two programs and their profiler outputs is not very meaningful, especially as Allegro used release code and Fladimir's alphablend functions used much slower profiling code (assuming you added his functions to your project).

As I said, just call each function lots of times (program runtime should be >10 seconds) and time how long it takes for the loops to complete. You can use Allegro timers for measuring the time passed.

Basically, your program should look like this:

1initialize program
2 
3initialize allegro blenders
4al_begin=time()
5loop while counter>0{
6 blend something
7 counter--
8}
9al_end=time()
10 
11initialize Fladimir's blenders
12fl_begin=time()
13loop while counter>0{
14 blend something
15 counter--
16}
17fl_end=time()
18print "allegro took " al_end-al_begin
19print "Fladimir tool" fl_end-fl_begin

It is not that profiling in general is meaningless. Just that it isn't the best tool in your case.

__________
In theory, there is no difference between theory and practice. But, in practice, there is - Jan L.A. van de Snepscheut
MMORPG's...Many Men Online Role Playing Girls - Radagar
"Is Java REALLY slower? Does STL really bloat your exes? Find out with your friendly host, HoHo, and his benchmarking machine!" - Jakub Wasilewski

Thomas Fjellstrom
Member #476
June 2000
avatar

For timing of this sort I'd use something a little higher res ;) like gettimeofday on unix.

--
Thomas Fjellstrom - [website] - [email] - [Allegro Wiki] - [Allegro TODO]
"If you can't think of a better solution, don't try to make a better solution." -- weapon_S
"The less evidence we have for what we believe is certain, the more violently we defend beliefs against those who don't agree" -- https://twitter.com/neiltyson/status/592870205409353730

HoHo
Member #4,534
April 2004
avatar

High-res is meaningful if you try to time stuff that take less than a few tenths of a second. For stuff that takes >5s any other method is good enough :)

__________
In theory, there is no difference between theory and practice. But, in practice, there is - Jan L.A. van de Snepscheut
MMORPG's...Many Men Online Role Playing Girls - Radagar
"Is Java REALLY slower? Does STL really bloat your exes? Find out with your friendly host, HoHo, and his benchmarking machine!" - Jakub Wasilewski

Go to: