DevCpp Learning to profile with

DevCpp Learning to profile with

Paul whoknows

Member #5,081

September 2004

I am using devcpp since a few days ago.
I have 2 functions for drawing anti-aliased sprites to a memory buffer, one is allegro's draw_trans_sprite(), and the other one is Fladimir's AlphaBlend32().
I want to know how much faster the fastest one is.

I am linking to liballp.a, and the compiler option -pg.
Attached is the source file with an alpha sprite for testing.

Here is the loop, I am calling both functions at the same time, I think that's the best way for testing, any advices?

1int main() {
 /* Setup Allegro */
 allegro_init();
 install_keyboard();
 install_mouse();
6 
 /*set color depth in 32bpp  */   
 set_color_depth( 32 );
 set_gfx_mode( GFX_AUTODETECT_WINDOWED, 800, 600, 0, 0 );
 //set_gfx_mode( GFX_AUTODETECT, 800, 600, 0, 0 );
 
 BITMAP *buffer = create_bitmap( SCREEN_W, SCREEN_H );
 clear_to_color(buffer, makecol(255, 0, 0));
 
 /* must be a 32-bit bitmap with an alpha channel! */
 BITMAP *sprite = load_bitmap( "olafito.tga", NULL );
 if(sprite == NULL ) allegro_message("not found\n");
 
19set_alpha_blender();
20 
21do {   
22 
 
 /* draw with AlphaBlend32*/
    AlphaBlend32( sprite, buffer, mouse_x-sprite->w/2, mouse_y-sprite->h/2, 255 );
    
 /* draw with draw_trans_sprite */   
    draw_trans_sprite(buffer, sprite, mouse_x-sprite->w/2, mouse_y-sprite->h/2);
    
  
 blit( buffer, screen, 0, 0, 0, 0, SCREEN_W, SCREEN_H );
32 
33 
34}while (!key[KEY_ESC]);
 return 0;
36}

.
.

____

"The unlimited potential has been replaced by the concrete reality of what I programmed today." - Jordan Mechner.

HoHo

Member #4,534

April 2004

Linking with profiling versions of Allegro is only meaningful if you want to profile Allegro itself. If you want to profile your program then compile it with -pg and link with regular allegro. That way allegro won't be slowed down by being profiled.

In case of your program I'd just have two loops that repeat one or the other function for million times. The one who takes less time wins. Profilers won't be much of use to you, unless you intend to improve implementations of these functions.

__________
In theory, there is no difference between theory and practice. But, in practice, there is - Jan L.A. van de Snepscheut
MMORPG's...Many Men Online Role Playing Girls - Radagar
"Is Java REALLY slower? Does STL really bloat your exes? Find out with your friendly host, HoHo, and his benchmarking machine!" - Jakub Wasilewski

Paul whoknows

Member #5,081

September 2004

Quote:

Linking with profiling versions of Allegro is only meaningful if you want to profile Allegro itself. If you want to profile your program then compile it with -pg and link with regular allegro. That way allegro won't be slowed down by being profiled.

I did not know that, in MSVC6 I was using the debug and profile version of allegro for debuging and profiling my program, that was not really necesary?

Back on topic, I changed my loop code following your advice.
I tested the program with draw_trans_sprite and AlphaBlend32, one first, and the other later.

#SelectExpand
  1set_alpha_blender();
  2int counter=0;
  3do {   
  4
  5   
  6   /* draw with AlphaBlend32*/
  7      AlphaBlend32( sprite, buffer, mouse_x-sprite->w/2, mouse_y-sprite->h/2, 255 );
  8      
  9   /* draw with draw_trans_sprite */   
 10    //  draw_trans_sprite(buffer, sprite, mouse_x-sprite->w/2, mouse_y-sprite->h/2);
 11      
 12    
 13   blit( buffer, screen, 0, 0, 0, 0, SCREEN_W, SCREEN_H );
 14   counter++;
 15
 16}while (!key[KEY_ESC] && counter<10000);
 17   return 0;
 18}

.
.
This is what I obtained for AlphaBlend32:
.
.

#SelectExpand
  1Each sample counts as 0.01 seconds.
  2  %   cumulative   self              self     total           
  3 time   seconds   seconds    calls  ms/call  ms/call  name    
  4 54.97      8.46     8.46    10000     0.85     1.54  AlphaBlend32(BITMAP*, BITMAP*, int, int, int)
  5 28.20     12.80     4.34 202430000     0.00     0.00  BlendColors32NoEmms(unsigned long, unsigned long, unsigned long)
  6 16.50     15.34     2.54 768000000     0.00     0.00  geta32
  7  0.19     15.37     0.03  2400000     0.00     0.00  bmp_write_line
  8  0.13     15.39     0.02                             _mangled_main()
  9  0.00     15.39     0.00    10000     0.00     0.00  CallEmms()
 10  0.00     15.39     0.00    10000     0.00     0.00  acquire_bitmap
 11  0.00     15.39     0.00    10000     0.00     0.00  bmp_unwrite_line
 12  0.00     15.39     0.00    10000     0.00     0.00  release_bitmap
 13  0.00     15.39     0.00        1     0.00     0.00  WinMain@16
 14  0.00     15.39     0.00        1     0.00     0.00  clear_to_color
 15
 16 %         the percentage of the total running time of the
 17time       program used by this function.
 18
 19cumulative a running sum of the number of seconds accounted
 20 seconds   for by this function and those listed above it.
 21
 22 self      the number of seconds accounted for by this
 23seconds    function alone.  This is the major sort for this
 24           listing.
 25
 26calls      the number of times this function was invoked, if
 27           this function is profiled, else blank.
 28 
 29 self      the average number of milliseconds spent in this
 30ms/call    function per call, if this function is profiled,
 31     else blank.
 32
 33 total     the average number of milliseconds spent in this
 34ms/call    function and its descendents per call, if this 
 35     function is profiled, else blank.
 36
 37name       the name of the function.  This is the minor sort
 38           for this listing. The index shows the location of
 39     the function in the gprof listing. If the index is
 40     in parenthesis it shows where it would appear in
 41     the gprof listing if it were to be printed.
 42
 43         Call graph (explanation follows)
 44
 45
 46granularity: each sample hit covers 4 byte(s) for 0.06% of 15.39 seconds
 47
 48index % time    self  children    called     name
 49                                                 <spontaneous>
 50[1]    100.0    0.02   15.37                 _mangled_main() [1]
 51                8.46    6.91   10000/10000       AlphaBlend32(BITMAP*, BITMAP*, int, int, int) [2]
 52                0.00    0.00       1/1           clear_to_color [13]
 53-----------------------------------------------
 54                8.46    6.91   10000/10000       _mangled_main() [1]
 55[2]     99.9    8.46    6.91   10000         AlphaBlend32(BITMAP*, BITMAP*, int, int, int) [2]
 56                4.34    0.00 202430000/202430000     BlendColors32NoEmms(unsigned long, unsigned long, unsigned long) [3]
 57                2.54    0.00 768000000/768000000     geta32 [4]
 58                0.03    0.00 2400000/2400000     bmp_write_line [5]
 59                0.00    0.00   10000/10000       acquire_bitmap [9]
 60                0.00    0.00   10000/10000       CallEmms() [8]
 61                0.00    0.00   10000/10000       bmp_unwrite_line [10]
 62                0.00    0.00   10000/10000       release_bitmap [11]
 63-----------------------------------------------
 64                4.34    0.00 202430000/202430000     AlphaBlend32(BITMAP*, BITMAP*, int, int, int) [2]
 65[3]     28.2    4.34    0.00 202430000         BlendColors32NoEmms(unsigned long, unsigned long, unsigned long) [3]
 66-----------------------------------------------
 67                2.54    0.00 768000000/768000000     AlphaBlend32(BITMAP*, BITMAP*, int, int, int) [2]
 68[4]     16.5    2.54    0.00 768000000         geta32 [4]
 69-----------------------------------------------
 70                0.03    0.00 2400000/2400000     AlphaBlend32(BITMAP*, BITMAP*, int, int, int) [2]
 71[5]      0.2    0.03    0.00 2400000         bmp_write_line [5]
 72-----------------------------------------------
 73                0.00    0.00   10000/10000       AlphaBlend32(BITMAP*, BITMAP*, int, int, int) [2]
 74[8]      0.0    0.00    0.00   10000         CallEmms() [8]
 75-----------------------------------------------
 76                0.00    0.00   10000/10000       AlphaBlend32(BITMAP*, BITMAP*, int, int, int) [2]
 77[9]      0.0    0.00    0.00   10000         acquire_bitmap [9]
 78-----------------------------------------------
 79                0.00    0.00   10000/10000       AlphaBlend32(BITMAP*, BITMAP*, int, int, int) [2]
 80[10]     0.0    0.00    0.00   10000         bmp_unwrite_line [10]
 81-----------------------------------------------
 82                0.00    0.00   10000/10000       AlphaBlend32(BITMAP*, BITMAP*, int, int, int) [2]
 83[11]     0.0    0.00    0.00   10000         release_bitmap [11]
 84-----------------------------------------------
 85                0.00    0.00       1/1           main [76]
 86[12]     0.0    0.00    0.00       1         WinMain@16 [12]
 87-----------------------------------------------
 88                0.00    0.00       1/1           _mangled_main() [1]
 89[13]     0.0    0.00    0.00       1         clear_to_color [13]
 90-----------------------------------------------

.
.
This is what I obtained for draw_trans_sprite
.
.

#SelectExpand
  1Flat profile:
  2
  3Each sample counts as 0.01 seconds.
  4  %   cumulative   self              self     total           
  5 time   seconds   seconds    calls  Ts/call  Ts/call  name    
  6100.00      0.01     0.01                             _mangled_main()
  7  0.00      0.01     0.00    10000     0.00     0.00  draw_trans_sprite
  8  0.00      0.01     0.00        1     0.00     0.00  WinMain@16
  9  0.00      0.01     0.00        1     0.00     0.00  clear_to_color
 10
 11 %         the percentage of the total running time of the
 12time       program used by this function.
 13
 14cumulative a running sum of the number of seconds accounted
 15 seconds   for by this function and those listed above it.
 16
 17 self      the number of seconds accounted for by this
 18seconds    function alone.  This is the major sort for this
 19           listing.
 20
 21calls      the number of times this function was invoked, if
 22           this function is profiled, else blank.
 23 
 24 self      the average number of milliseconds spent in this
 25ms/call    function per call, if this function is profiled,
 26     else blank.
 27
 28 total     the average number of milliseconds spent in this
 29ms/call    function and its descendents per call, if this 
 30     function is profiled, else blank.
 31
 32name       the name of the function.  This is the minor sort
 33           for this listing. The index shows the location of
 34     the function in the gprof listing. If the index is
 35     in parenthesis it shows where it would appear in
 36     the gprof listing if it were to be printed.
 37
 38         Call graph (explanation follows)
 39
 40
 41granularity: each sample hit covers 4 byte(s) for 100.00% of 0.01 seconds
 42
 43index % time    self  children    called     name
 44                                                 <spontaneous>
 45[1]    100.0    0.01    0.00                 _mangled_main() [1]
 46                0.00    0.00   10000/10000       draw_trans_sprite [4]
 47                0.00    0.00       1/1           clear_to_color [6]
 48-----------------------------------------------
 49                0.00    0.00   10000/10000       _mangled_main() [1]
 50[4]      0.0    0.00    0.00   10000         draw_trans_sprite [4]
 51-----------------------------------------------
 52                0.00    0.00       1/1           main [76]
 53[5]      0.0    0.00    0.00       1         WinMain@16 [5]
 54-----------------------------------------------
 55                0.00    0.00       1/1           _mangled_main() [1]
 56[6]      0.0    0.00    0.00       1         clear_to_color [6]
 57-----------------------------------------------

.
.
What does that mean? I can't even read it!>:(

____

"The unlimited potential has been replaced by the concrete reality of what I programmed today." - Jordan Mechner.

HoHo

Member #4,534

April 2004

As I said before, profiling is not the best way to find out what is faster, at least not in your case. Problem is, profiling code adds a lot of overhead and it won't give you all that accurate results. Also comparing two programs and their profiler outputs is not very meaningful, especially as Allegro used release code and Fladimir's alphablend functions used much slower profiling code (assuming you added his functions to your project).

As I said, just call each function lots of times (program runtime should be >10 seconds) and time how long it takes for the loops to complete. You can use Allegro timers for measuring the time passed.

Basically, your program should look like this:

1initialize program
2 
3initialize allegro blenders
4al_begin=time()
5loop while counter>0{
6  blend something
7  counter--
8} 
9al_end=time()
10 
11initialize Fladimir's blenders
12fl_begin=time()
13loop while counter>0{
14  blend something
15  counter--
16} 
17fl_end=time()
18print "allegro took " al_end-al_begin
19print "Fladimir tool" fl_end-fl_begin

It is not that profiling in general is meaningless. Just that it isn't the best tool in your case.

Thomas Fjellstrom

Member #476

June 2000

For timing of this sort I'd use something a little higher res like gettimeofday on unix.

--
Thomas Fjellstrom - [website] - [email] - [Allegro Wiki] - [Allegro TODO]
"If you can't think of a better solution, don't try to make a better solution." -- weapon_S
"The less evidence we have for what we believe is certain, the more violently we defend beliefs against those who don't agree" -- https://twitter.com/neiltyson/status/592870205409353730

HoHo

Member #4,534

April 2004

High-res is meaningful if you try to time stuff that take less than a few tenths of a second. For stuff that takes >5s any other method is good enough