Incrementing an integer is not thread-safe, period. If you read the whole thing, you would have seen the analysis of the assembly code.
You seem to be ignoring the point of that: when a variable is only written to by one thread, the atomicity of the entire read-modify-write operation becomes irrelevant. Only the atomicity of the write operation itself is relevant, because the variable cannot be modified asynchronously to that code (because only one thread modifies it). And the write is generally atomic, provided that the integer is aligned (caveats: not true on 386sx when paired with a 32 bit compiler, some 8 bit CPU/compiler combinations, a few other arcane places... but to get even a remotely possible failure you have to find a platform that uses one of the handful of CPUs on which that's not true, uses multiple CPUs concurrently, and runs Allegro programs; you'd have an easier time finding a system that uses 10 bit bytes).
You are just asking for trouble especially when the integer is an index to a buffer.
That is the part I'm agreeing with now - that using the variable as an index into a changing buffer is unsafe. According to my current understanding, this code has the capacity to return garbage input (though it almost never will). It could be fixed with threading libraries, platform specific locking mechanisms, or more exotic coding methods. However, I think that the Allegro internals have the same flaw in them, so you suffer from this flaw regardless...
CPUs have gone multicore, you have forgotten that???
I remembered the multicore part, but did not realize that Intel and others had weakened their memory coherency policies (I expected at least PRAM consistency for writes to volatile variables). And I presumed that Allegro would arrange things in such a way that its input callbacks were usable in a safe manner without extra dependencies or extraordinary effort.
Okay, I was too tired when coming to the conclusion that it wasn't safe. x86 DOES still offer strong memory coherency, I misread the docs as saying the opposite of what they said. Here's my current position (hopefully I will slow down the rate at which I change positions soon):
The code is correct on some platforms only.
Allegro code is correct on some platforms only.
x86/*: Allegro and this code both work, because of strong memory coherency guarantees on normal x86 systems.
non-x86/linux: Allegro and this code probably work correctly, if the user code is totally single-threaded. Merely having only one thread talk to allegro is not enough though - it must also be the thread that handles signals. On x86 specifically the issue is (I think) irrelevant though because of the x86/* strong coherency guarantees. However, the buf.lock mechanism should be kept, because in (very rare) cases the callback can become reentrant.
PPC/macOSX: Allegro and this code are both incorrect on this platform.
PPC/macOS9 and x86/DOS: Allegro and this code are correct, because (I think) the code all executes in interrupt context anyway.