blit(with height=1) much slower than blit(with height=2).

blit(with height=1) much slower than blit(with height=2).

Samuel DEVULDER

Member #14,923

February 2013

Hello!

I'm new to this forum so if this isn't the right place for such a question, please forgive me, and redirect me to the proper place.

I've got a new PC (AMD 64Bits, ATI Radeon, windows 8) and installed on it an emulator (http://sourceforge.net/projects/teoemulator/) that uses Allegro. Notice that it is actually an old 4.0 version of allegro because the soft has to run on old machines (say win9x). That emulator works correctly on rather old PCs (winXP, 32bits, intel graphics, 6 years old), but on that new PC with certain rendering modes, the emulator runs deadly slow.

More precisely, I've done some profiling and discovered than when blit() is called with height=1, allegro uses lots of cpu time (in kernel mode apparently). If blit() is called with height=2 (or more) the CPU usage drops down to a very small fraction, and the kernel time is also no more noticeable.

I've browsed the code of blit() and discovered that, on my machine, it calls _linear_blit32 which is plain x86 asm. I don't see any reason or linear_blit32 running faster with height=2 than with height=1; but the fact is that it is. For some reason, moving fewer data from the bitmap onto the screen takes much more time than moving the whole screen. Is there anything special about this assembly code on AMD-64 processors slowing down it to hell? I'm thinking about weird alignment issues, or so. Do other people noticed this behavior? What can be done to work around this issue?

In fact the single-line blit is used to simulate interlaced screen of old CRTs. Is there someting clever than blit() that should be done to copy odd lines of one bitmap onto the screen? Maybe some existing games have already developped a nice and smart technique to simulate old CRTs using allegro. Any information on such a technique will be welcome as well.

Regards,

sam.

Arthur Kalliokoski

Second in Command

February 2005

Samuel DEVULDER said:

I'm new to this forum so if this isn't the right place for such a question, please forgive me, and redirect me to the proper place.

The Allegro Development forum is for developing the Allegro library itself, you wanted to post in Programming Questions. Not really a huge deal, maybe a moderator will move this to that forum later.

As for the main questions, I don't have 4.0 handy, but in 4.4.2 I see it has WRITE_BANK and UNWRITE_BANK macros, which I'd guess is for VESA 1.x compatibility, intended to work with 16 bit machines which can only access 64Kb maximum per segment. If this program is always using linear screen modes, I'd say you should be able to get away with moving the data yourself with memmove(), as long as you take the scanline width and screen width along with bits per pixel into account to get the starting address of each scanline and the proper number of bytes to move. You might also get a speed increase over a blit of height 1 simply using _getpixel<number> and _putpixel<number>, i.e. if you're in a 16 bit per pixel mode, you'd use _getpixel16() and _putpixel16().

They all watch too much MSNBC... they get ideas.

Samuel DEVULDER

Member #14,923

February 2013

Is there any reason for the XXX_BANK macros being several order of magniture slower with single-line blits than with multi-line blits? Does newer versions of allegro suffer from the same issue?

Thomas Fjellstrom

Member #476

June 2000

Newer versions of Allegro dumped the ASM completely. They depend on the compiler to do a decent job optimizing. The compiler can generally do a much better job optimizing for a variety of cpus than even a competent ASM programmer, and ALWAYS does a better job than our non-existent ASM maintainer.

--
Thomas Fjellstrom - [website] - [email] - [Allegro Wiki] - [Allegro TODO]
"If you can't think of a better solution, don't try to make a better solution." -- weapon_S
"The less evidence we have for what we believe is certain, the more violently we defend beliefs against those who don't agree" -- https://twitter.com/neiltyson/status/592870205409353730

Arthur Kalliokoski

Second in Command

February 2005

Samuel DEVULDER said:

Is there any reason for the XXX_BANK macros being several order of magniture slower with single-line blits than with multi-line blits?

If it's crossing the memory banks, it has to go into kernel mode to change the bank it's reading/writing from, so possibly the multiline blits are doing that much less frequently than the single line blits.

The Manual said:

You should be aware, however, that a lot of SVGA cards don't provide separate read and write banks, which means that blitting from one part of the screen to another requires the use of a temporary bitmap in memory, and is therefore extremely slow.

Maybe if you use linear screen modes (with the GFX_VESA2L driver) this won't happen.

They all watch too much MSNBC... they get ideas.