Allegro.cc - Online Community

Allegro.cc Forums » Programming Questions » bool faster than bitwise?

This thread is locked; no one can reply to it. rss feed Print
 1   2 
bool faster than bitwise?
Jonatan Hedborg
Member #4,886
July 2004
avatar

That's some fancy bitwise magic you got there Oscar, nice :D

EDIT:
But it's still a teeny bit slower than test 1 (on my machine at least...)

count = 52440724
time = 0.711079 seconds
count = 52427535
time = 0.710474 seconds
count = 52430027
time = 0.71097 seconds

vs.

count = 52432916
time = 0.864145 seconds
count = 52422009
time = 0.865 seconds
count = 52423410
time = 0.863674 seconds

Still, it's better than the other bitfield-tests.

Trent Gamblin
Member #261
April 2000
avatar

It's still about 1/10 of a second slower here... what machine do you have that it runs in 0.1 seconds? O_O. Also, it tests the bytes in reverse order of the other program.

Here's some random access tests, in these the bool method is a bit faster:
[edit]: On my amd machine the first test takes 2.3 seconds and the second 2.7. So the results are inconclusive...

#SelectExpand
1// randtest1.c 2#include <stdio.h> 3#include <stdlib.h> 4#include <time.h> 5 6#define NUM 1024*1024*100 7#define SQRT_NUM 10240 8#define SIZE 50 9#define LOOPS 50000 10#define TESTS 5 11 12char bools[NUM]; 13int r[LOOPS*2]; 14 15int main(void) 16{ 17 clock_t start, end; 18 int test; 19 int loop; 20 int i, j; 21 int count = 0; 22 23 srand(time(NULL)); 24 25 for (i = 0; i < NUM; i++) { 26 bools[i] = rand() % 2; 27 if (i < LOOPS*2) 28 r[i] = ((float)(rand()%RAND_MAX)/RAND_MAX)*(SQRT_NUM-SIZE); 29 } 30 31 for (test = 0; test < TESTS; test++) { 32 start = clock(); 33 34 for (loop = 0; loop < LOOPS; loop++) { 35 int x = r[loop*2]; 36 int y = r[loop*2+1]; 37 for (j = y; j < y+SIZE; j++) { 38 for (i = x; i < x+SIZE; i++) { 39 if (bools[j*SQRT_NUM+i]) { 40 count++; 41 } 42 } 43 } 44 } 45 46 end = clock(); 47 48 printf("count = %d\n", count); 49 50 printf("time = %g seconds\n", (float)(end-start)/CLOCKS_PER_SEC); 51 } 52}

#SelectExpand
1// randtest2.c 2#include <stdio.h> 3#include <stdlib.h> 4#include <time.h> 5 6#define NUM 1024*1024*100 7#define SQRT_NUM 10240 8#define LOOPS 50000 9#define SIZE 50 10#define TESTS 5 11 12char bools[NUM/8]; 13int r[LOOPS*2]; 14 15int main(void) 16{ 17 int count = 0; 18 clock_t start, end; 19 int bit = 0; 20 int byte = 0; 21 int test; 22 int loop; 23 int i, j; 24 25 srand(time(NULL)); 26 27 for (i = 0; i < NUM/8; i++) { 28 bools[i] = rand() % 256; 29 if (i < LOOPS*2) 30 r[i] = ((float)(rand() % RAND_MAX)/RAND_MAX)*(SQRT_NUM-SIZE); 31 } 32 33 for (test = 0; test < TESTS; test++) { 34 start = clock(); 35 36 for (loop = 0; loop < LOOPS; loop++) { 37 int x = r[loop*2]; 38 int y = r[loop*2+1]; 39 for (j = y; j < y+SIZE; j++) { 40 for (i = x; i < x+SIZE; i++) { 41 int byte = (j*SQRT_NUM+i)>>3; 42 if ((bools[byte] >> (x%8)) & 1) { 43 count++; 44 } 45 } 46 } 47 } 48 49 end = clock(); 50 51 printf("count = %d\n", count); 52 53 printf("time = %g seconds\n", (float)(end-start)/CLOCKS_PER_SEC); 54 } 55}

Results:

count = 62508973
time = 1.71414 seconds
count = 125017946
time = 1.71485 seconds
count = 187526919
time = 1.71329 seconds
count = 250035892
time = 1.71355 seconds
count = 312544865
time = 1.71301 seconds

count = 62495742
time = 1.47988 seconds
count = 124991484
time = 1.48007 seconds
count = 187487226
time = 1.48014 seconds
count = 249982968
time = 1.47874 seconds
count = 312478710
time = 1.47809 seconds

Elias
Member #358
May 2000

My test:

#SelectExpand
1#include <stdio.h> 2#include <stdlib.h> 3#include <time.h> 4 5#define BITS 1000 6#define NUM 1000000 7 8volatile bool bools[BITS]; 9volatile unsigned char bits[BITS / 8]; 10volatile int count1, count2; 11 12int main(void) { 13 14 for (int j = 0; j < BITS; j++) { 15 bools[j] = rand() % 2; 16 } 17 18 for (int j = 0; j < BITS / 8; j++) { 19 bits[j] = rand() % 256; 20 } 21 22 clock_t t1 = clock(); 23 count1 = 0; 24 for (int i = 0; i < NUM; i++) { 25 for (int j = 0; j < BITS; j++) { 26 if (bools[j] & 1) count1++; 27 } 28 } 29 clock_t t2 = clock(); 30 31 clock_t t3 = clock(); 32 count2 = 0; 33 for (int i = 0; i < NUM; i++) { 34 for (int j = 0; j < BITS / 8; j++) { 35 if (bits[j] & 1) count2++; 36 if (bits[j] & 2) count2++; 37 if (bits[j] & 4) count2++; 38 if (bits[j] & 8) count2++; 39 if (bits[j] & 16) count2++; 40 if (bits[j] & 32) count2++; 41 if (bits[j] & 64) count2++; 42 if (bits[j] & 128) count2++; 43 } 44 } 45 clock_t t4 = clock(); 46 47 printf("bool: count: %d seconds: %f\n", count1, (t2 - t1) / (double)CLOCKS_PER_SEC); 48 printf("bits: count: %d seconds: %f\n", count2, (t4 - t3) / (double)CLOCKS_PER_SEC); 49 50 return 0; 51}

Output:

bool: count: 493000000 seconds: 5.320000
bits: count: 487000000 seconds: 1.900000

So, bits are more than twice as fast as bool here. If I change these two lines:

#define BITS 1000
#define NUM 1000000

to

#define BITS 1000000
#define NUM 1000

The result is:

bool: count: 499845000 seconds: 5.450000
bits: count: 498304000 seconds: 4.370000

So bool is only slightly slower now, which can be explained by having to do more loop iterations to test all the bools.

How I would interpret this is that the only difference is the used memory - the actual if() is not relevant. But with bits you use only 1/8th of the memory so you may end up using the cache more efficiently which can make it much faster.

--
"Either help out or stop whining" - Evert

Trent Gamblin
Member #261
April 2000
avatar

The numbers aren't that far off... unroll the first loop and it's almost as fast as the second.

[edit]

I think my random access tests were the best as far as actual usage goes... they test 2d blocks of bits inside a 2d array of bits... what you'd do for collision detection. They may not be written optimally though. I would be interested in how those tests perform for other people and on what hardware... and I think enabling optimizing in the compiler is not really fair since these tests are so simple and can be optimized away to nothing while in a larger program they couldn't.

Evert
Member #794
November 2000
avatar

What have I done! :o

Trent Gamblin
Member #261
April 2000
avatar

Some more numbers from randtest1 and 2:

        Intel A100 (atom predecessor) | Core2 Quad 2.4Ghz
test 1      5.3 seconds               |      1.3 seconds 
test 2      5.3 seconds               |      1.1 seconds

The A100 is about the same on both tests, while the older still AMD I have is faster on the 1 byte bool test. Everything else I've tried is faster on bitfields. Seems like newer CPUs are better at bitfields than older ones.

Oscar Giner
Member #2,207
April 2002
avatar

Oh, when I changed the code to fix the mistake I didn't rerun it (I was in a hurry, and I didn't expect it to change the results).

The change was this:

 if (bools[i>>3] & (0x80 >> (i&0x8))) count++;

fixed to:

 if (bools[i>>3] & (0x80 >> (i&0x7))) count++;

That simple change made the time go from 0.14 to 0.42 :o I guess the first one was optimized to a shift :/

I think enabling optimizing in the compiler is not really fair since these tests are so simple and can be optimized away to nothing while in a larger program they couldn't.

The optimizer can't really remove anything from these tests, they're ok. I think optimizations need to be on to be able to compare how it performs in a real scenario (the bitfield version can benefit from optimizations, unless you do the implementation yourself in assembler).

Jonatan Hedborg
Member #4,886
July 2004
avatar

I guess the first one was optimized to a shift :/

I think it evaluated to "true" a lot less, resulting in fewer "count++" being executed.

 1   2 


Go to: