I'm not directlly accessing frame buffer objects to do what I do so I don't think that's going to make a huge difference. Especially considering the following:
I have a mid-range video card: A GeForce 9800 GT. I can handle about 10,000 draw calls in an A5 program per frame (with deferred drawing) without dipping below 60 FPS. At 1920x1080 resolution, it takes roughly 8000 draw calls to fill the screen with a 16x16 tilemap, leaving only 2000 free for sprites, text, etc.
In my game though, I also have a depth effect going. To achieve this effect by simply drawing everything again would require ANOTHER 24,000 draw calls. Now we're way past the 10,000 limit I can reach, much less a user with a low-end graphics card/chipset.
The engine I've developed only draws in new rows and columns to the background when necessary. A row of 16x16 tiles takes about 150 draw calls. A column takes about 80 draw calls. (I'm factoring in that the depth effect requires the buffer to be about 20% larger than the screen size.)
The entire redraw, including the depth effect, of the background to the screen takes a single al_draw_prim() call with 24 verticies passed in, drawing 8 triangles.
Draw Calls Required per Frame at 60 FPS When:
Moving Diagonally at a high speed: 2 cols + 2 rows + BG = 461
Moving Vertically at high speed: 3 rows + BG = 451
Not Moving at all: BG = 1
Substantially less than the 10,000 I can hit and well within the limits of many systems.
I've tested this engine on several systems now and have reached framerates on the poor ones that even some commercial 2D titles using tilemaps can't hit!