Slowness when drawing fonts

Slowness when drawing fonts

Thomas Fjellstrom

Member #476

June 2000

Edgar Reynaldo said:

Why does it have to be drawn first?

I don't know. But I assume it has to do with how non fixed width fonts will render a specific sentence. A sentence's length depends on the order of the characters. Sometimes a row here or there is skipped.

--
Thomas Fjellstrom - [website] - [email] - [Allegro Wiki] - [Allegro TODO]
"If you can't think of a better solution, don't try to make a better solution." -- weapon_S
"The less evidence we have for what we believe is certain, the more violently we defend beliefs against those who don't agree" -- https://twitter.com/neiltyson/status/592870205409353730

Edgar Reynaldo

Major Reynaldo

May 2007

@Arthur
I got verdana.ttf to look good by using the flag ALLEGRO_TTF_MONOCHROME when loading the font. It disables anti-aliasing for the font. It also made consola.ttf look chunky and ugly though, so it doesn't work for every font.

@Thomas
It still seems like it should just be a sum of widths and offsets based on a lookup. Without knowing the guts of Allegro 5 and FreeType though, I couldn't say if it could be done.

I expanded my test to include a second measurement of al_get_text_width for the same strings, and it took just as long as the first time! That has got to be the slowest function I've seen in a while. 13 5 character strings takes 0.3 seconds to measure the width of? I guess that precludes me from making any text editors with Allegro 5 then.

Why doesn't al_get_text_width take advantage of the fact that the glyphs it is using are already cached?

Eagle and Allegro 5 binaries | Older Allegro 4 and 5 binaries | Allegro 5 compile guide
King Piccolo will make it so Then I could just flip it with some sweet, sweet matrix math. In either case, I’ll ride this wagon until the wheels fall off. Thanks to those that keep it running

Peter Wang

Member #23

April 2000

Matthew Leverton said:

Theoretically, yes. While it may seem more proper to put big new features into a 5.2 branch, it (in my opinion) isn't worth it because maintaining two major releases is a pain in the butt. People will expect the older 5.0 branch to be continued to be maintained (bug fixed), which is just a hassle when it comes to providing binaries.

I would leave 5.2 alone until we come up with things that either completely break backward compatibility or are major changes to the core or how things are done. I don't know what would fit in that latter category.

Peter W's opinion would be more useful than mine.

I've had problems with my internet connection the last few days, so I haven't been following closely.

My plan was to stick with the Allegro 4 model, of which there are two. In 4.0.x we maintained forward and backwards compatibility, meaning no new symbols would be added in a stable branch. In 4.2.x, we dropped the forwards compatibility requirement. Following that model, we could add new symbols within the 5.0.x branch. I haven't done that so far, even for small additions.

Our development model doesn't really allow backporting of big changes. All sorts of stuff gets dumped together into the unstable branch then refined over time. It's not easy to untangle the final state of a single feature for inclusion into the stable branch. The alternative is to stabilise all the new features together at once, then declare the unstable branch 'stable'. Since we already call the unstable branch '5.1', it would be a bit confusing to release it as '5.0.4'... but possible.

Matthew Leverton

Supreme Loser

January 1999

Whether something is called 5.0.X or 5.2.0 doesn't matter to me as long as we aren't trying to maintain 5.0 for very long after 5.2 is released. Obviously programs that dynamically link against 5.0 would cease getting free upgrades, but I don't particularly care about that...

Would you include new addons in 5.0? Seems like those shouldn't require any changes to the core Allegro that would be hard to backport.

--
RTFM | Follow Me on Google+ | I know 10 people

Slartibartfast

Member #8,789

June 2007

AMCerasoli said:

If that "someone" had read the entire thread he would know that I'm using multi-thread but the example (which is presenting the same problem) isn't.

Someone has read the entire thread before trying to help you.
Someone has also seen it explained to you over and over again that:
1) You must load fonts from the "main" thread. (technically you have to load them from the thread that creates the display if I understand correctly)
2) If certain glyphs in the font are uncached, they will be cached when they are first drawn (because there's no better way to automatically choose what to cache) but
3) You can cause the glyphs to be cached yourself (for example by writing "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ123456780-=" once before that actual game logic is started), which means that there will be no delays during the game when first writing something.
This is it. This is how things are.

You however keep repeating that allegro should cache some glyphs for you when first loading the font, and again it has been explained to you that it will only move the "horrible delay" from when you (the programmer) choose to draw/cache the font (for example, by doing (3) from before) to the loading function.
You can even implement the function you want to have by yourself!
Something like: (not guaranteed to work, just an example)

ALLEGRO_FONT *AMCerasoli_load_and_cache(const char *filename, int size, int flags)
{
 ALLEGRO_FONT *font = al_load_font(filename, size, flags);
 al_get_text_width(font, "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890");
}
ALLEGRO_FONT *AMCerasoli_load_and_cache(const char *filename, int size, int flags)
{
 ALLEGRO_FONT *font = al_load_font(filename, size, flags);
 al_get_text_width(font, "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890");
}

AMCerasoli said:

my current game is running at 30FPS logic and drawing, for that reason I can see it.

I probably shouldn't ask, but how is your game timed to 30FPS? You didn't by chance time it like the example by simply calling al_rest() after each loop, right?

----
"No man who has at least one good hand has to rape someone if he's horny and can't get any." - Neil Black
"If God truly would send you to hell for trusting your senses and logic, I'd rather spend eternity alongside likeminded people than with a bunch of ignorant fools." - mEmO
"I'm going gay again" - weapon_S

Elias

Member #358

May 2000

Edgar Reynaldo said:

The manual says absolutely nothing about al_get_text_width caching glyphs. Are we supposed to be psychic?

No, the manual and/or code should be improved. That's why threads like this are good (if anyone will care to submit a patch)

Quote:

PS. Why does verdana.ttf look like (some characters are noticeably lighter or darker than others) ?

Try setting your web browser to use verdana.ttf (with the same size). Does it look different there? If not then the problem is with the font. Otherwise if it looks better than Allegro there, try using another program using freetype (for me that's firefox on linux, which has pixel-by-pixel 100% identical rgb values to Allegro here). If there is a difference then the ttf addon does something wrong and we should try to fix it.

Edgar Reynaldo said:

I expanded my test to include a second measurement of al_get_text_width for the same strings, and it took just as long as the first time!

Hm, maybe al_get_text_width doesn't cache after all, let me check the code...

[edit:]

I can't try any examples from here, but at least it does look like it uses the cache (if (glyph->bitmap)): http://allefant.com/gtags/S/1921.html#L178

Is your example updated to do that second measurement? kazzmir's timing results suggest that caching is being used (and improves time a lot). But AMCerasoli also saw no improvement after the al_get_text_width() line it seems... does allegro.log say anything suspicious? There may very well be some bug in the glyph caching code (I remember reports here where the first cached glyph was never drawn for example but could never find a cause.)

Maybe you and AMCerasoli are both using DirectX but everyone else is using OpenGL? And then the bug would be somewhere in the DirectX code, maybe it fails locking very small bitmap regions very often or something.

--
"Either help out or stop whining" - Evert

Michał Cichoń

Member #11,736

March 2010

Linux make worst possible use of FreeType, so this is bad example. The cause of odd looking Vertana is probably in A5 TTF addon code. I didn't looked at the it yet.

"God starts from scratch too"
Windows Allegro Build Repo: http://targonski.nazwa.pl/thedmd/allegro/

Elias

Member #358

May 2000

Michał Cichoń said:

Linux make worst possible use of FreeType

Got some reference for that?

--
"Either help out or stop whining" - Evert

Michał Cichoń

Member #11,736

March 2010

I think this describe the issue.
Mostly I agree with opinion in this article.

"God starts from scratch too"
Windows Allegro Build Repo: http://targonski.nazwa.pl/thedmd/allegro/

Elias

Member #358

May 2000

Interesting, but at least in my Linux (Ubuntu, but manually changed the gnome font settings until it looked good) Freetype just seems to be perfectly tuned then. Coincidentally, text there looks almost like in Windows (and on my Mac indeed all text looks worse than both of those)

--
"Either help out or stop whining" - Evert

Michał Cichoń

Member #11,736

March 2010

FreeType require at least basic knowledge about typography how it apply to raster grids. People responsible for creating font rendering in our new engine are very creative on field of missuses.
I had to agree Linux (Gnome actually) has improved on field of text rendering.
Ok, maybe we should leave this as a digression. Because how exactly "good looking" text is defined depends from who is looking.
Please go back to performance problems

"God starts from scratch too"
Windows Allegro Build Repo: http://targonski.nazwa.pl/thedmd/allegro/

Elias

Member #358

May 2000

Edgar Reynaldo: I modified your example to do the cache timing twice and this is what it looks here:
{"name":"604237","src":"\/\/djungxnpq2nug.cloudfront.net\/image\/cache\/5\/2\/52dd016bca4c51b20edcb30f3e06ce72.png","w":810,"h":629,"tn":"\/\/djungxnpq2nug.cloudfront.net\/image\/cache\/5\/2\/52dd016bca4c51b20edcb30f3e06ce72"}

So yes, caching is slow. But at least it does seem to work and al_get_text_width() is fast once the glyphs are cached.

--
"Either help out or stop whining" - Evert

Peter Wang

Member #23

April 2000

The glyph caching in the TTF addon should probably be optimised by caching more than one glyph at a time, thereby removing redundant lock/unlocks. (I didn't check it though).

Edgar Reynaldo

Major Reynaldo

May 2007

There was a bug - I was adding the wrong variable to the second cache's total time.

After I fixed it, the second cache time was almost negligible. I also compared DIRECT3D to OPENGL and OPENGL_3_0 drivers :

Direct3D :

c:\ctwoplus\progcode\allegro5\test>A5Bouncer.exe
Average load time     = 1.278 ms
Average cache time    = 310.100 ms
Average cache2 time   = 0.021 ms
Average display time  = 0.202 ms
Test count = 100

OpenGL :

c:\ctwoplus\progcode\allegro5\test>A5Bouncer.exe
Average load time     = 0.639 ms
Average cache time    = 41.820 ms
Average cache2 time   = 0.014 ms
Average display time  = 0.295 ms
Test count = 500

OpenGL 3.0 :
Couldn't test, failed to create a display with this option on my laptop.

OpenGL is a little more than 7.5 times as fast as Direct3D when creating the initial cache of glyphs, but slightly slower to draw and faster to load.

Karadoc ~~

Member #2,749

September 2002

I just ran the test of my computer.
Average load time = 0.34 ms
Average cache time = 25 ms
Average cache2 time = 0.016 ms
Average display time = 0.065 ms

But... The main reason I'm posting is that the first time I ran it the average cache time was more like 300ms. It dropped dramatically the second time without me having changed anything. I can only guess that the drop was a result of some kind of Windows 7 automatic optimization.

(I assume it's using Direct3D.)

-----------

Arthur Kalliokoski

Second in Command

February 2005

Karadoc ~~ said:

the drop was a result of some kind of Windows 7 automatic optimization.

I'd guess verdana.ttf was already in file buffers.

They all watch too much MSNBC... they get ideas.

Elias

Member #358

May 2000

Peter Wang said:

The glyph caching in the TTF addon should probably be optimised by caching more than one glyph at a time, thereby removing redundant lock/unlocks. (I didn't check it though).

Yes, if someone makes a patch for an extra al_cache_glyph_rage function as Edgar proposed it could do that. However I'm not sure it really would be faster. At least with OpenGL locking a bitmap region only locks that region (i.e. GPU transfer is limited to the locked area). Multiple calls therefore only save the lock commands themselves. Since the al_get_text_width("abc...") approach does not do any other OpenGL calls between the lock/unlock calls it might have little to no effect. The driver even might optimize it into a single lock already. But only trying it out would tell of course.

About the DirectX case, not sure why it's so slow. I don't understand how locking works there at all - if someone likes to read up on MSDN I'm sure we could make it as fast as OpenGL. If it's that every lock/unlock call has to download/upload the whole bitmap from/to the GPU then reducing the times the bitmap is locked will have a big impact there of course.

--
"Either help out or stop whining" - Evert

Peter Wang

Member #23

April 2000

I was thinking we'd just cache the new glyphs as a first pass over the string in the render() method, then draw them in a second pass. That way you can minimise the number of times you lock the cache bitmap. I had a quick try at this, but it's not a completely trivial change.

I think it would make a big difference. On the benchmark you posted, I can reduce the average cache time from 19.5 ms to ~13 ms just by reducing the cache bitmap size from 256x256 to 128x128. If I switch to using memory bitmaps, the cache time drops to 0.9 ms (obviously drawing time goes up).

Elias

Member #358

May 2000

Hm, I guess something with locking doesn't work the way I thought then.

Or maybe your numbers can be explained by the fact that we do an unconditional texture upload of the complete texture whenever a bitmap is first created. This is of course stupid. Whenever a 256x256 cache is created al_create_bitmap allocates 256x256 pixels worth of uninitialized memory and uploads them to a texture just to clear that texture immediately afterwards. Instead we should leave the texture un-initialized in the first place (pass NULL to glTexImage2D). (This will also speed up al_load_bitmap.)

Also, looking at the ttf code we do some stupid things during cache creation, for example:

                    unsigned char c = *ptr;
                    float cf = c / 255.0f;
                    *dptr++ = 255 * cf;
                    *dptr++ = 255 * cf;
                    *dptr++ = 255 * cf;
                    *dptr++ = c;

That's a useless divide, three useless multiplies and 4 useless float/char conversions for each pixel. Probably negligible performance-wise but still hurts my eyes

[edit:]

I changed both of those things, but neither made a difference.

--
"Either help out or stop whining" - Evert

Evert

Member #794

November 2000

Elias said:

Probably negligible performance-wise but still hurts my eyes [edit:]I changed both of those things, but neither made a difference.

The compiler may be clever enough to figure out that it should eliminate both the divide and the multiply. Maybe. Maybe not.
I wouldn't expect a massive performance boost either way, but yes, that sight of that code makes you want to slam you head against your desk (or it would, were I not on the sofa with a fever).

Elias

Member #358

May 2000

Ok, according to valgrind --tool=cachegrind 75% of all time is spent in glTexSubImage2d<-ogl_unlock_region<-al_unlock_region<-al_unlock_bitmap->cache_glyph<-render_glyph<-text_length<-al_get_text_width.

(92% of all time is spent in al_get_text_width - FT_Load_Glyph only takes up about 8%)

So yes, caching multiple glyphs at a time and thus reducing the calls to glTexSubImage2D seems to be the way to optimize this

Also interesting, half of the time spent in glTexSubImage2D is actually shown inside of memcpy - so it seems the first thing my OpenGL driver does is copy the passed memory somewhere else. Could also have to do with valgrind, everything runs very slow under it so its profiling results can be quite inaccurate.

--
"Either help out or stop whining" - Evert

Peter Wang

Member #23

April 2000

Ok, I'm working on it.

Thomas Fjellstrom

Member #476

June 2000

Yeah, one of the things you want to do with many OpenGL Drivers is cut down on the number of OpenGL calls as much as possible. Many of them do a lot of setup and tear down for one call, and it can add up.

Peter Wang

Member #23

April 2000

In 5.1 now:

{"name":"604282","src":"\/\/djungxnpq2nug.cloudfront.net\/image\/cache\/e\/d\/ed92c4fa8cd621687df68979d49d84f8.png","w":800,"h":600,"tn":"\/\/djungxnpq2nug.cloudfront.net\/image\/cache\/e\/d\/ed92c4fa8cd621687df68979d49d84f8"}

The bottleneck now is clearing new bitmaps(!). This is only a hack to avoid artefacts when OpenGL samples the pixels around the actual glyph. If I comment that out:

{"name":"604283","src":"\/\/djungxnpq2nug.cloudfront.net\/image\/cache\/d\/7\/d753057ed86b6505a3c8db333762ae22.png","w":800,"h":600,"tn":"\/\/djungxnpq2nug.cloudfront.net\/image\/cache\/d\/7\/d753057ed86b6505a3c8db333762ae22"}

(I don't understand the other timing differences.)
The clearing is avoidable, but is too tedious for me to do.

Edgar Reynaldo

Major Reynaldo

May 2007

WOW!

Any chance you can do the same thing for the Direct3D drivers?