Premultiplied is clear now. The contribution of the source is 'pre-calculated', by subtracting a proportionate amount; then the contribution of the destination is calculated using the alpha of the source (also subtracting a proportionate amount).
For the 'clear-except-alpha-channel workaround' I have two options:
1//Very pseudo code
alpha_mask; //Black with alpha pattern
buffer2; //No premultiply alpha
add, one, one);
add, one, zero, add, zero, one);
Clearly option #2 is smaller and doesn't use an extra buffer, but my gut says #1 uses faster operations. I.e. the rectfill with blender would take some more time. It is very much an edge case (in my mind), and I'd like to see how they compare in practice.
or else lock a bitmap region and alter the pixel values directly.
This looks at least faster than using a blender and drawing a filled rectangle... No wait; HW accel is out of the question then... but then you can resort to fast CPU instructions... and then it gets out of my league
IMHO interesting to compare, but I've lost my USB stick ATM...
BTW could anybody comment on the hardware acceleration part? The manual is silent and the source is confusing.