is it faster in practice to list your boolean values individually, or pack them into a bit and get their values with bitwise operators?
In magic land, it seems like bitwise would be faster to get because it's more 'low level'. But I imagine it would actually be slower since you need to operate on the bit before doing the actual true/false comparison.
There are single assembly language instructions to test individual bits, and you can test multiple bits at once with immediate operands, although IIRC those are slightly slower than comparing entire integers to zero (boolean). As always, time it both ways (with rdtsc, if possible) and go with the faster version. If the times are so close as to be inconclusive, go with the "easy to understand" version.
Nota bene: If you have a huge amount of items to test, using the bitfields may be faster since they won't blow out the L1 cache.
I challenge you to write a test program which proves one technique is faster than the other.
bool isBoolFasterThanBitwise() { return rand() & 1; }
(It uses both bitwise and bool!)
I'd bet my bamccaig's left nut this is never going to be a performance bottleneck in any real application.
That being said; I would imagine the bool version is faster, since it deals with the CPU's native word size and has fewer operations. Cache issues might come into effect at some point... If you have gazillions of these checks, it might be a memory usage issue more than a performance issue.
I wrote an eratosthenes sieve prime calculator, and I assumed that using bools (actually it was C so I used chars...) would be faster than using a bit vector. At one point I started running out of memory so I switched to using a bit vector and was astounded by how much faster it was.
But as Jonatan says, in any real program the bit vector is unlikely to be the performance bottleneck.
Unless limited by the amount of memory, I would prefer to use bool.
I would bet that boolean is faster, since at a minimum, a bit comes from a byte, and there is the extra bitwise operation to get at the bit you want plus there's the division at a higher level to get which byte you want.
I would use bitfields where you need a lot/have low memory requirements and booleans elsewhere. I doubt it'll make a big difference, except memory wise.
Bitwise will usually be faster. But it depends on how you access those bits. If you access them in a random order it may get slower, but accessing them in order will be a lot faster than using bools. The operators needed to access a single bit are very simple, most of them requiring just 1 CPU cycle, so they're not really a problem. Packing the data in bits will greatly reduce the memory bandwidth usage and will help cache a lot too. Of course the difference becomes noticeable when you access a lot of these bits (for a perfect pixel collision algortithm for example).
As a note, std::vector<bool> is specialized to use a single bit to store each bool.
I would bet that boolean is faster, since at a minimum, a bit comes from a byte, and there is the extra bitwise operation to get at the bit you want plus there's the division at a higher level to get which byte you want.
This is what I was saying in my post about prime calculation. If the array is big enough, then the bit vector wins on performance (as well as memory usage) because you get 8 times as much information in the cache.
In magic land, it seems like bitwise would be faster to get because it's more 'low level'. But I imagine it would actually be slower since you need to operate on the bit before doing the actual true/false comparison.
Correct.
You can argue both sides of this argument, in the end, it'll depend on the particular application. If you're using enough bools that you're hammering the processor cache, then using bit fields may be faster because you can simply fit more data in the cache. If you're not filling up the CPU cache, then a plain bool (int) will probably be faster, since you don't have to do the extra work of extracting the bits.
Ok, you made me do it:
test1 results:
count = 52424828 time = 0.990623 seconds count = 52432056 time = 0.990727 seconds count = 52430310 time = 0.991179 seconds
test2 results:
count = 52423770 time = 1.33507 seconds count = 52422236 time = 1.33386 seconds count = 52423791 time = 1.33561 seconds
Could you try it with
for (i = 0; i < NUM; i++) { if ((bools[byte] >> bit) & 1) { count++; } bit++; if (bit == 8) { byte++; bit = 0; } }
Replaced by
for (i = 0; i < NUM; i++) { int b = i/8; if ((bools[b] >> (i - b*8)) & 1) { count++; } }
?
Just curious if "if" and increments is faster than a "%".
EDIT: err, my bad. Needs some more modification
EDIT2: That..might work?
That won't even work, since it never increases byte.
Updated my code. That should work?
count = 52426098
time = 1.5705 seconds
[edit] Offtopic but interesting just how random rand() is on my system
I wonder what the difference would be for random access of bits, with such a large data set.
Edit: I missed one obvious optimization (that's what you get for only using high-level languages )
for (i = 0; i < NUM; i++) { int b = i>>3; if ((bools[b] >> (i - b*8)) & 1) { count++; } }
What about:
for (i = 0; i < NUM; i++) { if ((bools[i] & 1) { count++; } }
Yes, it will only test one of the possible 8 bits, but in your test case above you're mostly measuring the speed of changing your bit variable and not what we are interested in which is the boolean check. In actual code you would also test against a fixed bit (like 1) so it's the speed we're interested in.
[edit:}
Or alternatively, have another loop where you do just your bit manipulation (but not the check) then subtract that time from the result.
First test machine was Intel C2D. I tested on a single core amd and got:
1.3 seconds
1.5 seconds
2.0 seconds
For test1, test2, and jonathans test.
[edit]
Elias, Actually, what we're interested in is testing a bunch of bits. Timing == vs & 1 is meaningless.
[edit2]
Jonathan, with that shift it's 1.4 seconds.
Another test would be, for actual bitfields:
[edit:]
Elias, Actually, what we're interested in is testing a bunch of bits. Timing == vs & 1 is meaningless.
No, it makes no difference which bit you test. But you can use this instead to test all 8, but without the time comsuming code to change a variable:
That test is only slightly slower than the bool test. I'll bet the compiler is using bytes there anyhow, and the compiler adds a bit of overhead
count = 52431537
time = 1.08432 seconds
the compiler adds a bit of overhead
You couldn't ask for less overhead!
I think the tests so far aren't very realistic. I'll whip up something better...
Of course if you write suboptimal code it's going to be slower
Why don't you try this instead:
Test1: 0.34 sec
Test2: 0.145 sec
I don't have much time, maybe I've done some error with all that bitwise operations, so correct me if there's a misstake.
<edit>
Corrected small error: it was i&0x7, no i&0x8. (I want a mask of the last 3 bits).
I'd bet my bamccaig's left nut this is never going to be a performance bottleneck in any real application.
You stay away from my left nut.
That's some fancy bitwise magic you got there Oscar, nice
EDIT:
But it's still a teeny bit slower than test 1 (on my machine at least...)
count = 52440724
time = 0.711079 seconds
count = 52427535
time = 0.710474 seconds
count = 52430027
time = 0.71097 seconds
vs.
count = 52432916
time = 0.864145 seconds
count = 52422009
time = 0.865 seconds
count = 52423410
time = 0.863674 seconds
Still, it's better than the other bitfield-tests.
It's still about 1/10 of a second slower here... what machine do you have that it runs in 0.1 seconds? O_O. Also, it tests the bytes in reverse order of the other program.
Here's some random access tests, in these the bool method is a bit faster:
[edit]: On my amd machine the first test takes 2.3 seconds and the second 2.7. So the results are inconclusive...
Results:
count = 62508973 time = 1.71414 seconds count = 125017946 time = 1.71485 seconds count = 187526919 time = 1.71329 seconds count = 250035892 time = 1.71355 seconds count = 312544865 time = 1.71301 seconds
count = 62495742 time = 1.47988 seconds count = 124991484 time = 1.48007 seconds count = 187487226 time = 1.48014 seconds count = 249982968 time = 1.47874 seconds count = 312478710 time = 1.47809 seconds
My test:
Output:
bool: count: 493000000 seconds: 5.320000 bits: count: 487000000 seconds: 1.900000
So, bits are more than twice as fast as bool here. If I change these two lines:
#define BITS 1000 #define NUM 1000000
to
#define BITS 1000000 #define NUM 1000
The result is:
bool: count: 499845000 seconds: 5.450000 bits: count: 498304000 seconds: 4.370000
So bool is only slightly slower now, which can be explained by having to do more loop iterations to test all the bools.
How I would interpret this is that the only difference is the used memory - the actual if() is not relevant. But with bits you use only 1/8th of the memory so you may end up using the cache more efficiently which can make it much faster.
The numbers aren't that far off... unroll the first loop and it's almost as fast as the second.
[edit]
I think my random access tests were the best as far as actual usage goes... they test 2d blocks of bits inside a 2d array of bits... what you'd do for collision detection. They may not be written optimally though. I would be interested in how those tests perform for other people and on what hardware... and I think enabling optimizing in the compiler is not really fair since these tests are so simple and can be optimized away to nothing while in a larger program they couldn't.
What have I done!
Some more numbers from randtest1 and 2:
Intel A100 (atom predecessor) | Core2 Quad 2.4Ghz test 1 5.3 seconds | 1.3 seconds test 2 5.3 seconds | 1.1 seconds
The A100 is about the same on both tests, while the older still AMD I have is faster on the 1 byte bool test. Everything else I've tried is faster on bitfields. Seems like newer CPUs are better at bitfields than older ones.
Oh, when I changed the code to fix the mistake I didn't rerun it (I was in a hurry, and I didn't expect it to change the results).
The change was this:
if (bools[i>>3] & (0x80 >> (i&0x8))) count++;
fixed to:
if (bools[i>>3] & (0x80 >> (i&0x7))) count++;
That simple change made the time go from 0.14 to 0.42 I guess the first one was optimized to a shift :/
I think enabling optimizing in the compiler is not really fair since these tests are so simple and can be optimized away to nothing while in a larger program they couldn't.
The optimizer can't really remove anything from these tests, they're ok. I think optimizations need to be on to be able to compare how it performs in a real scenario (the bitfield version can benefit from optimizations, unless you do the implementation yourself in assembler).
I guess the first one was optimized to a shift :/
I think it evaluated to "true" a lot less, resulting in fewer "count++" being executed.