DevCpp Learning to profile with
Paul whoknows

I am using devcpp since a few days ago.
I have 2 functions for drawing anti-aliased sprites to a memory buffer, one is allegro's draw_trans_sprite(), and the other one is Fladimir's AlphaBlend32().
I want to know how much faster the fastest one is.

I am linking to liballp.a, and the compiler option -pg.
Attached is the source file with an alpha sprite for testing.

Here is the loop, I am calling both functions at the same time, I think that's the best way for testing, any advices?

1int main() {
2 /* Setup Allegro */
3 allegro_init();
4 install_keyboard();
5 install_mouse();
6 
7 /*set color depth in 32bpp */
8 set_color_depth( 32 );
9 set_gfx_mode( GFX_AUTODETECT_WINDOWED, 800, 600, 0, 0 );
10 //set_gfx_mode( GFX_AUTODETECT, 800, 600, 0, 0 );
11
12 BITMAP *buffer = create_bitmap( SCREEN_W, SCREEN_H );
13 clear_to_color(buffer, makecol(255, 0, 0));
14
15 /* must be a 32-bit bitmap with an alpha channel! */
16 BITMAP *sprite = load_bitmap( "olafito.tga", NULL );
17 if(sprite == NULL ) allegro_message("not found\n");
18
19set_alpha_blender();
20 
21do {
22 
23
24 /* draw with AlphaBlend32*/
25 AlphaBlend32( sprite, buffer, mouse_x-sprite->w/2, mouse_y-sprite->h/2, 255 );
26
27 /* draw with draw_trans_sprite */
28 draw_trans_sprite(buffer, sprite, mouse_x-sprite->w/2, mouse_y-sprite->h/2);
29
30
31 blit( buffer, screen, 0, 0, 0, 0, SCREEN_W, SCREEN_H );
32 
33 
34}while (!key[KEY_ESC]);
35 return 0;
36}

.
.

HoHo

Linking with profiling versions of Allegro is only meaningful if you want to profile Allegro itself. If you want to profile your program then compile it with -pg and link with regular allegro. That way allegro won't be slowed down by being profiled.

In case of your program I'd just have two loops that repeat one or the other function for million times. The one who takes less time wins. Profilers won't be much of use to you, unless you intend to improve implementations of these functions.

Paul whoknows
Quote:

Linking with profiling versions of Allegro is only meaningful if you want to profile Allegro itself. If you want to profile your program then compile it with -pg and link with regular allegro. That way allegro won't be slowed down by being profiled.

I did not know that, in MSVC6 I was using the debug and profile version of allegro for debuging and profiling my program, that was not really necesary?

Back on topic, I changed my loop code following your advice.
I tested the program with draw_trans_sprite and AlphaBlend32, one first, and the other later.

#SelectExpand
1set_alpha_blender(); 2int counter=0; 3do { 4 5 6 /* draw with AlphaBlend32*/ 7 AlphaBlend32( sprite, buffer, mouse_x-sprite->w/2, mouse_y-sprite->h/2, 255 ); 8 9 /* draw with draw_trans_sprite */ 10 // draw_trans_sprite(buffer, sprite, mouse_x-sprite->w/2, mouse_y-sprite->h/2); 11 12 13 blit( buffer, screen, 0, 0, 0, 0, SCREEN_W, SCREEN_H ); 14 counter++; 15 16}while (!key[KEY_ESC] && counter<10000); 17 return 0; 18}

.
.
This is what I obtained for AlphaBlend32:
.
.

#SelectExpand
1Each sample counts as 0.01 seconds. 2 % cumulative self self total 3 time seconds seconds calls ms/call ms/call name 4 54.97 8.46 8.46 10000 0.85 1.54 AlphaBlend32(BITMAP*, BITMAP*, int, int, int) 5 28.20 12.80 4.34 202430000 0.00 0.00 BlendColors32NoEmms(unsigned long, unsigned long, unsigned long) 6 16.50 15.34 2.54 768000000 0.00 0.00 geta32 7 0.19 15.37 0.03 2400000 0.00 0.00 bmp_write_line 8 0.13 15.39 0.02 _mangled_main() 9 0.00 15.39 0.00 10000 0.00 0.00 CallEmms() 10 0.00 15.39 0.00 10000 0.00 0.00 acquire_bitmap 11 0.00 15.39 0.00 10000 0.00 0.00 bmp_unwrite_line 12 0.00 15.39 0.00 10000 0.00 0.00 release_bitmap 13 0.00 15.39 0.00 1 0.00 0.00 WinMain@16 14 0.00 15.39 0.00 1 0.00 0.00 clear_to_color 15 16 % the percentage of the total running time of the 17time program used by this function. 18 19cumulative a running sum of the number of seconds accounted 20 seconds for by this function and those listed above it. 21 22 self the number of seconds accounted for by this 23seconds function alone. This is the major sort for this 24 listing. 25 26calls the number of times this function was invoked, if 27 this function is profiled, else blank. 28 29 self the average number of milliseconds spent in this 30ms/call function per call, if this function is profiled, 31 else blank. 32 33 total the average number of milliseconds spent in this 34ms/call function and its descendents per call, if this 35 function is profiled, else blank. 36 37name the name of the function. This is the minor sort 38 for this listing. The index shows the location of 39 the function in the gprof listing. If the index is 40 in parenthesis it shows where it would appear in 41 the gprof listing if it were to be printed. 42 43 Call graph (explanation follows) 44 45 46granularity: each sample hit covers 4 byte(s) for 0.06% of 15.39 seconds 47 48index % time self children called name 49 <spontaneous> 50[1] 100.0 0.02 15.37 _mangled_main() [1] 51 8.46 6.91 10000/10000 AlphaBlend32(BITMAP*, BITMAP*, int, int, int) [2] 52 0.00 0.00 1/1 clear_to_color [13] 53----------------------------------------------- 54 8.46 6.91 10000/10000 _mangled_main() [1] 55[2] 99.9 8.46 6.91 10000 AlphaBlend32(BITMAP*, BITMAP*, int, int, int) [2] 56 4.34 0.00 202430000/202430000 BlendColors32NoEmms(unsigned long, unsigned long, unsigned long) [3] 57 2.54 0.00 768000000/768000000 geta32 [4] 58 0.03 0.00 2400000/2400000 bmp_write_line [5] 59 0.00 0.00 10000/10000 acquire_bitmap [9] 60 0.00 0.00 10000/10000 CallEmms() [8] 61 0.00 0.00 10000/10000 bmp_unwrite_line [10] 62 0.00 0.00 10000/10000 release_bitmap [11] 63----------------------------------------------- 64 4.34 0.00 202430000/202430000 AlphaBlend32(BITMAP*, BITMAP*, int, int, int) [2] 65[3] 28.2 4.34 0.00 202430000 BlendColors32NoEmms(unsigned long, unsigned long, unsigned long) [3] 66----------------------------------------------- 67 2.54 0.00 768000000/768000000 AlphaBlend32(BITMAP*, BITMAP*, int, int, int) [2] 68[4] 16.5 2.54 0.00 768000000 geta32 [4] 69----------------------------------------------- 70 0.03 0.00 2400000/2400000 AlphaBlend32(BITMAP*, BITMAP*, int, int, int) [2] 71[5] 0.2 0.03 0.00 2400000 bmp_write_line [5] 72----------------------------------------------- 73 0.00 0.00 10000/10000 AlphaBlend32(BITMAP*, BITMAP*, int, int, int) [2] 74[8] 0.0 0.00 0.00 10000 CallEmms() [8] 75----------------------------------------------- 76 0.00 0.00 10000/10000 AlphaBlend32(BITMAP*, BITMAP*, int, int, int) [2] 77[9] 0.0 0.00 0.00 10000 acquire_bitmap [9] 78----------------------------------------------- 79 0.00 0.00 10000/10000 AlphaBlend32(BITMAP*, BITMAP*, int, int, int) [2] 80[10] 0.0 0.00 0.00 10000 bmp_unwrite_line [10] 81----------------------------------------------- 82 0.00 0.00 10000/10000 AlphaBlend32(BITMAP*, BITMAP*, int, int, int) [2] 83[11] 0.0 0.00 0.00 10000 release_bitmap [11] 84----------------------------------------------- 85 0.00 0.00 1/1 main [76] 86[12] 0.0 0.00 0.00 1 WinMain@16 [12] 87----------------------------------------------- 88 0.00 0.00 1/1 _mangled_main() [1] 89[13] 0.0 0.00 0.00 1 clear_to_color [13] 90-----------------------------------------------

.
.
This is what I obtained for draw_trans_sprite
.
.

#SelectExpand
1Flat profile: 2 3Each sample counts as 0.01 seconds. 4 % cumulative self self total 5 time seconds seconds calls Ts/call Ts/call name 6100.00 0.01 0.01 _mangled_main() 7 0.00 0.01 0.00 10000 0.00 0.00 draw_trans_sprite 8 0.00 0.01 0.00 1 0.00 0.00 WinMain@16 9 0.00 0.01 0.00 1 0.00 0.00 clear_to_color 10 11 % the percentage of the total running time of the 12time program used by this function. 13 14cumulative a running sum of the number of seconds accounted 15 seconds for by this function and those listed above it. 16 17 self the number of seconds accounted for by this 18seconds function alone. This is the major sort for this 19 listing. 20 21calls the number of times this function was invoked, if 22 this function is profiled, else blank. 23 24 self the average number of milliseconds spent in this 25ms/call function per call, if this function is profiled, 26 else blank. 27 28 total the average number of milliseconds spent in this 29ms/call function and its descendents per call, if this 30 function is profiled, else blank. 31 32name the name of the function. This is the minor sort 33 for this listing. The index shows the location of 34 the function in the gprof listing. If the index is 35 in parenthesis it shows where it would appear in 36 the gprof listing if it were to be printed. 37 38 Call graph (explanation follows) 39 40 41granularity: each sample hit covers 4 byte(s) for 100.00% of 0.01 seconds 42 43index % time self children called name 44 <spontaneous> 45[1] 100.0 0.01 0.00 _mangled_main() [1] 46 0.00 0.00 10000/10000 draw_trans_sprite [4] 47 0.00 0.00 1/1 clear_to_color [6] 48----------------------------------------------- 49 0.00 0.00 10000/10000 _mangled_main() [1] 50[4] 0.0 0.00 0.00 10000 draw_trans_sprite [4] 51----------------------------------------------- 52 0.00 0.00 1/1 main [76] 53[5] 0.0 0.00 0.00 1 WinMain@16 [5] 54----------------------------------------------- 55 0.00 0.00 1/1 _mangled_main() [1] 56[6] 0.0 0.00 0.00 1 clear_to_color [6] 57-----------------------------------------------

.
.
What does that mean? I can't even read it!>:(

HoHo

As I said before, profiling is not the best way to find out what is faster, at least not in your case. Problem is, profiling code adds a lot of overhead and it won't give you all that accurate results. Also comparing two programs and their profiler outputs is not very meaningful, especially as Allegro used release code and Fladimir's alphablend functions used much slower profiling code (assuming you added his functions to your project).

As I said, just call each function lots of times (program runtime should be >10 seconds) and time how long it takes for the loops to complete. You can use Allegro timers for measuring the time passed.

Basically, your program should look like this:

1initialize program
2 
3initialize allegro blenders
4al_begin=time()
5loop while counter>0{
6 blend something
7 counter--
8}
9al_end=time()
10 
11initialize Fladimir's blenders
12fl_begin=time()
13loop while counter>0{
14 blend something
15 counter--
16}
17fl_end=time()
18print "allegro took " al_end-al_begin
19print "Fladimir tool" fl_end-fl_begin

It is not that profiling in general is meaningless. Just that it isn't the best tool in your case.

Thomas Fjellstrom

For timing of this sort I'd use something a little higher res ;) like gettimeofday on unix.

HoHo

High-res is meaningful if you try to time stuff that take less than a few tenths of a second. For stuff that takes >5s any other method is good enough :)

Thread #589857. Printed from Allegro.cc