bool faster than bitwise?

Mark Oates

is it faster in practice to list your boolean values individually, or pack them into a bit and get their values with bitwise operators?

In magic land, it seems like bitwise would be faster to get because it's more 'low level'. But I imagine it would actually be slower since you need to operate on the bit before doing the actual true/false comparison.

Arthur Kalliokoski

There are single assembly language instructions to test individual bits, and you can test multiple bits at once with immediate operands, although IIRC those are slightly slower than comparing entire integers to zero (boolean). As always, time it both ways (with rdtsc, if possible) and go with the faster version. If the times are so close as to be inconclusive, go with the "easy to understand" version.

Nota bene: If you have a huge amount of items to test, using the bitfields may be faster since they won't blow out the L1 cache.

Milan Mimica

I challenge you to write a test program which proves one technique is faster than the other.

ImLeftFooted

bool isBoolFasterThanBitwise()
{
    return rand() & 1;
}

(It uses both bitwise and bool!)

Jonatan Hedborg

I'd bet my bamccaig's left nut this is never going to be a performance bottleneck in any real application.

That being said; I would imagine the bool version is faster, since it deals with the CPU's native word size and has fewer operations. Cache issues might come into effect at some point... If you have gazillions of these checks, it might be a memory usage issue more than a performance issue.

James Stanley

I wrote an eratosthenes sieve prime calculator, and I assumed that using bools (actually it was C so I used chars...) would be faster than using a bit vector. At one point I started running out of memory so I switched to using a bit vector and was astounded by how much faster it was.

But as Jonatan says, in any real program the bit vector is unlikely to be the performance bottleneck.

Shravan

Unless limited by the amount of memory, I would prefer to use bool.

Trent Gamblin

I would bet that boolean is faster, since at a minimum, a bit comes from a byte, and there is the extra bitwise operation to get at the bit you want plus there's the division at a higher level to get which byte you want.

I would use bitfields where you need a lot/have low memory requirements and booleans elsewhere. I doubt it'll make a big difference, except memory wise.

Oscar Giner

Bitwise will usually be faster. But it depends on how you access those bits. If you access them in a random order it may get slower, but accessing them in order will be a lot faster than using bools. The operators needed to access a single bit are very simple, most of them requiring just 1 CPU cycle, so they're not really a problem. Packing the data in bits will greatly reduce the memory bandwidth usage and will help cache a lot too. Of course the difference becomes noticeable when you access a lot of these bits (for a perfect pixel collision algortithm for example).

As a note, std::vector<bool> is specialized to use a single bit to store each bool.

James Stanley

Trent Gamblin said:

This is what I was saying in my post about prime calculation. If the array is big enough, then the bit vector wins on performance (as well as memory usage) because you get 8 times as much information in the cache.

Evert

Mark Oates said:

Correct.
You can argue both sides of this argument, in the end, it'll depend on the particular application. If you're using enough bools that you're hammering the processor cache, then using bit fields may be faster because you can simply fit more data in the cache. If you're not filling up the CPU cache, then a plain bool (int) will probably be faster, since you don't have to do the extra work of extracting the bits.

Trent Gamblin

Ok, you made me do it:

#SelectExpand
  1// test1.c
  2#include <stdio.h>
  3#include <stdlib.h>
  4#include <time.h>
  5
  6#define NUM 1024*1024*100
  7
  8char bools[NUM];
  9
 10int main(void)
 11{
 12  int i;
 13  int count = 0;
 14  clock_t start, end;
 15
 16  srand(time(NULL));
 17
 18  for (i = 0; i < NUM; i++) {
 19    bools[i] = rand() % 2;
 20  }
 21
 22  start = clock();
 23
 24  for (i = 0; i < NUM; i++) {
 25    if (bools[i]) {
 26      count++;
 27    }
 28  }
 29
 30  end = clock();
 31
 32  printf("count = %d\n", count);
 33
 34  printf("time = %g seconds\n", (float)(end-start)/CLOCKS_PER_SEC);
 35}

#SelectExpand
  1// test2.c
  2
  3#include <stdio.h>
  4#include <stdlib.h>
  5#include <time.h>
  6
  7#define NUM 1024*1024*100
  8
  9char bools[NUM/8];
 10
 11int main(void)
 12{
 13  int i;
 14  int count = 0;
 15  clock_t start, end;
 16  int bit = 0;
 17  int byte = 0;
 18
 19  srand(time(NULL));
 20
 21  for (i = 0; i < NUM/8; i++) {
 22    bools[i] = rand() % 256;
 23  }
 24
 25  start = clock();
 26
 27  for (i = 0; i < NUM; i++) {
 28    if ((bools[byte] >> bit) & 1) {
 29      count++;
 30    }
 31    bit++;
 32    if (bit == 8) {
 33      byte++;
 34      bit = 0;
 35    }
 36  }
 37
 38  end = clock();
 39
 40  printf("count = %d\n", count);
 41
 42  printf("time = %g seconds\n", (float)(end-start)/CLOCKS_PER_SEC);
 43}

test1 results:

count = 52424828
time = 0.990623 seconds
count = 52432056
time = 0.990727 seconds
count = 52430310
time = 0.991179 seconds

test2 results:

count = 52423770
time = 1.33507 seconds
count = 52422236
time = 1.33386 seconds
count = 52423791
time = 1.33561 seconds

Jonatan Hedborg

Could you try it with

for (i = 0; i < NUM; i++) {
    if ((bools[byte] >> bit) & 1) {
      count++;
    }
    bit++;
    if (bit == 8) {
      byte++;
      bit = 0;
    }
  }

Replaced by

for (i = 0; i < NUM; i++) {
    int b = i/8;
    if ((bools[b] >> (i - b*8)) & 1) {
      count++;
    }
  }

Just curious if "if" and increments is faster than a "%".

EDIT: err, my bad. Needs some more modification

EDIT2: That..might work?

Trent Gamblin

That won't even work, since it never increases byte.

Jonatan Hedborg

Updated my code. That should work?

Trent Gamblin

count = 52426098
time = 1.5705 seconds

[edit] Offtopic but interesting just how random rand() is on my system

Jonatan Hedborg

I wonder what the difference would be for random access of bits, with such a large data set.

Edit: I missed one obvious optimization (that's what you get for only using high-level languages )

  for (i = 0; i < NUM; i++) {
      int b = i>>3;
      if ((bools[b] >> (i - b*8)) & 1) {
        count++;
      }
    }

Elias

What about:

for (i = 0; i < NUM; i++) {
    if ((bools[i] & 1) {
        count++;
    }
}

Yes, it will only test one of the possible 8 bits, but in your test case above you're mostly measuring the speed of changing your bit variable and not what we are interested in which is the boolean check. In actual code you would also test against a fixed bit (like 1) so it's the speed we're interested in.

[edit:}

Or alternatively, have another loop where you do just your bit manipulation (but not the check) then subtract that time from the result.

Trent Gamblin

First test machine was Intel C2D. I tested on a single core amd and got:

1.3 seconds
1.5 seconds
2.0 seconds

For test1, test2, and jonathans test.

[edit]

Elias, Actually, what we're interested in is testing a bunch of bits. Timing == vs & 1 is meaningless.

[edit2]

Jonathan, with that shift it's 1.4 seconds.

Elias

Another test would be, for actual bitfields:

#SelectExpand
  1#include <time.h>
  2
  3#define NUM 1024*1024*100
  4
  5struct Bools
  6{
  7    int bool1 : 1;
  8};
  9
 10struct Bools bools[NUM];
 11
 12int main(void)
 13{
 14  int i;
 15  int count = 0;
 16  clock_t start, end;
 17
 18  srand(time(NULL));
 19
 20  for (i = 0; i < NUM; i++) {
 21    bools[i].bool1 = rand() % 2;
 22  }
 23
 24  start = clock();
 25
 26  for (i = 0; i < NUM; i++) {
 27    if (bools[i].bool1) {
 28      count++;
 29    }
 30  }
 31
 32  end = clock();
 33
 34  printf("count = %d\n", count);
 35
 36  printf("time = %g seconds\n", (float)(end-start)/CLOCKS_PER_SEC);
 37}

[edit:]

Trent Gamblin said:

Elias, Actually, what we're interested in is testing a bunch of bits. Timing == vs & 1 is meaningless.

No, it makes no difference which bit you test. But you can use this instead to test all 8, but without the time comsuming code to change a variable:

#SelectExpand
  1for (i = 0; i < NUM; i++) {
  if ((bools[i] & 1) {
      count++;
  }
  if ((bools[i] & 2) {
      count++;
  }
  if ((bools[i] & 4) {
      count++;
  }
  if ((bools[i] & 8) {
      count++;
  }
  if ((bools[i] & 16) {
      count++;
  }
  if ((bools[i] & 32) {
      count++;
  }
  if ((bools[i] & 64) {
      count++;
  }
  if ((bools[i] & 128) {
      count++;
  }
 26}

Trent Gamblin

That test is only slightly slower than the bool test. I'll bet the compiler is using bytes there anyhow, and the compiler adds a bit of overhead

count = 52431537
time = 1.08432 seconds

Mark Oates

Trent Gamblin said:

the compiler adds a bit of overhead

You couldn't ask for less overhead!

Trent Gamblin

I think the tests so far aren't very realistic. I'll whip up something better...

Oscar Giner

Of course if you write suboptimal code it's going to be slower

Why don't you try this instead:

#SelectExpand
  1#include <stdio.h>
  2#include <stdlib.h>
  3#include <time.h>
  4
  5#define NUM 1024*1024*100
  6
  7char bools[NUM/8];
  8
  9int main(void)
 10{
 11  int i;
 12  int count = 0;
 13  clock_t start, end;
 14  int bit = 0;
 15
 16  srand(time(NULL));
 17
 18  for (i = 0; i < NUM/8; i++) {
 19    bools[i] = rand() % 256;
 20  }
 21
 22  start = clock();
 23
 24  for (i = 0; i < NUM; i++) {
 25    if (bools[i>>3] & (0x80 >> (i&0x7))) count++;
 26  }
 27
 28  end = clock();
 29
 30  printf("count = %d\n", count);
 31
 32  printf("time = %g seconds\n", (float)(end-start)/CLOCKS_PER_SEC);
 33}

Test1: 0.34 sec
Test2: 0.145 sec

I don't have much time, maybe I've done some error with all that bitwise operations, so correct me if there's a misstake.

<edit>
Corrected small error: it was i&0x7, no i&0x8. (I want a mask of the last 3 bits).

bamccaig

Jonatan Hedborg said:

I'd bet my bamccaig's left nut this is never going to be a performance bottleneck in any real application.

You stay away from my left nut. >:(

Jonatan Hedborg

That's some fancy bitwise magic you got there Oscar, nice

EDIT:
But it's still a teeny bit slower than test 1 (on my machine at least...)

count = 52440724
time = 0.711079 seconds
count = 52427535
time = 0.710474 seconds
count = 52430027
time = 0.71097 seconds

vs.

count = 52432916
time = 0.864145 seconds
count = 52422009
time = 0.865 seconds
count = 52423410
time = 0.863674 seconds

Still, it's better than the other bitfield-tests.

Trent Gamblin

It's still about 1/10 of a second slower here... what machine do you have that it runs in 0.1 seconds? O_O. Also, it tests the bytes in reverse order of the other program.

Here's some random access tests, in these the bool method is a bit faster:
[edit]: On my amd machine the first test takes 2.3 seconds and the second 2.7. So the results are inconclusive...

#SelectExpand
  1// randtest1.c
  2#include <stdio.h>
  3#include <stdlib.h>
  4#include <time.h>
  5
  6#define NUM 1024*1024*100
  7#define SQRT_NUM 10240
  8#define SIZE 50
  9#define LOOPS 50000
 10#define TESTS 5
 11
 12char bools[NUM];
 13int r[LOOPS*2];
 14
 15int main(void)
 16{
 17  clock_t start, end;
 18  int test;
 19  int loop;
 20  int i, j;
 21  int count = 0;
 22
 23  srand(time(NULL));
 24
 25  for (i = 0; i < NUM; i++) {
 26    bools[i] = rand() % 2;
 27    if (i < LOOPS*2)
 28      r[i] = ((float)(rand()%RAND_MAX)/RAND_MAX)*(SQRT_NUM-SIZE);
 29  }
 30
 31  for (test = 0; test < TESTS; test++) {
 32    start = clock();
 33
 34    for (loop = 0; loop < LOOPS; loop++) {
 35      int x = r[loop*2];
 36      int y = r[loop*2+1];
 37      for (j = y; j < y+SIZE; j++) {
 38        for (i = x; i < x+SIZE; i++) {
 39          if (bools[j*SQRT_NUM+i]) {
 40            count++;
 41          }
 42        }
 43      }
 44    }
 45
 46    end = clock();
 47
 48    printf("count = %d\n", count);
 49
 50    printf("time = %g seconds\n", (float)(end-start)/CLOCKS_PER_SEC);
 51  }
 52}

#SelectExpand
  1// randtest2.c
  2#include <stdio.h>
  3#include <stdlib.h>
  4#include <time.h>
  5
  6#define NUM 1024*1024*100
  7#define SQRT_NUM 10240
  8#define LOOPS 50000
  9#define SIZE 50
 10#define TESTS 5
 11
 12char bools[NUM/8];
 13int r[LOOPS*2];
 14
 15int main(void)
 16{
 17  int count = 0;
 18  clock_t start, end;
 19  int bit = 0;
 20  int byte = 0;
 21  int test;
 22  int loop;
 23  int i, j;
 24
 25  srand(time(NULL));
 26
 27  for (i = 0; i < NUM/8; i++) {
 28    bools[i] = rand() % 256;
 29    if (i < LOOPS*2)
 30      r[i] = ((float)(rand() % RAND_MAX)/RAND_MAX)*(SQRT_NUM-SIZE);
 31  }
 32
 33  for (test = 0; test < TESTS; test++) {
 34    start = clock();
 35
 36    for (loop = 0; loop < LOOPS; loop++) {
 37      int x = r[loop*2];
 38      int y = r[loop*2+1];
 39      for (j = y; j < y+SIZE; j++) {
 40        for (i = x; i < x+SIZE; i++) {
 41          int byte = (j*SQRT_NUM+i)>>3;
 42          if ((bools[byte] >> (x%8)) & 1) {
 43            count++;
 44          }
 45        }
 46      }
 47    }
 48
 49    end = clock();
 50
 51    printf("count = %d\n", count);
 52
 53    printf("time = %g seconds\n", (float)(end-start)/CLOCKS_PER_SEC);
 54  }
 55}

Results:

count = 62508973
time = 1.71414 seconds
count = 125017946
time = 1.71485 seconds
count = 187526919
time = 1.71329 seconds
count = 250035892
time = 1.71355 seconds
count = 312544865
time = 1.71301 seconds

count = 62495742
time = 1.47988 seconds
count = 124991484
time = 1.48007 seconds
count = 187487226
time = 1.48014 seconds
count = 249982968
time = 1.47874 seconds
count = 312478710
time = 1.47809 seconds

Elias

My test:

#SelectExpand
  1#include <stdio.h>
  2#include <stdlib.h>
  3#include <time.h>
  4
  5#define BITS 1000
  6#define NUM 1000000
  7
  8volatile bool bools[BITS];
  9volatile unsigned char bits[BITS / 8];
 10volatile int count1, count2;
 11
 12int main(void) {
  
  for (int j = 0; j < BITS; j++) {
      bools[j] = rand() % 2;
  }
  
  for (int j = 0; j < BITS / 8; j++) {
      bits[j] = rand() % 256;
  }
  
  clock_t t1 = clock();
  count1 = 0;
  for (int i = 0; i < NUM; i++) {
      for (int j = 0; j < BITS; j++) {
          if (bools[j] & 1) count1++;
      }
  }
  clock_t t2 = clock();
  
  clock_t t3 = clock();
  count2 = 0;
  for (int i = 0; i < NUM; i++) {
      for (int j = 0; j < BITS / 8; j++) {
          if (bits[j] & 1) count2++;
          if (bits[j] & 2) count2++;
          if (bits[j] & 4) count2++;
          if (bits[j] & 8) count2++;
          if (bits[j] & 16) count2++;
          if (bits[j] & 32) count2++;
          if (bits[j] & 64) count2++;
          if (bits[j] & 128) count2++;
      }
  }
  clock_t t4 = clock();
  
  printf("bool: count: %d seconds: %f\n", count1, (t2 - t1) / (double)CLOCKS_PER_SEC);
  printf("bits: count: %d seconds: %f\n", count2, (t4 - t3) / (double)CLOCKS_PER_SEC);
  
  return 0;
 51}

Output:

bool: count: 493000000 seconds: 5.320000
bits: count: 487000000 seconds: 1.900000

So, bits are more than twice as fast as bool here. If I change these two lines:

#define BITS 1000
#define NUM 1000000

#define BITS 1000000
#define NUM 1000

The result is:

bool: count: 499845000 seconds: 5.450000
bits: count: 498304000 seconds: 4.370000

So bool is only slightly slower now, which can be explained by having to do more loop iterations to test all the bools.

How I would interpret this is that the only difference is the used memory - the actual if() is not relevant. But with bits you use only 1/8th of the memory so you may end up using the cache more efficiently which can make it much faster.

Trent Gamblin

The numbers aren't that far off... unroll the first loop and it's almost as fast as the second.

[edit]

I think my random access tests were the best as far as actual usage goes... they test 2d blocks of bits inside a 2d array of bits... what you'd do for collision detection. They may not be written optimally though. I would be interested in how those tests perform for other people and on what hardware... and I think enabling optimizing in the compiler is not really fair since these tests are so simple and can be optimized away to nothing while in a larger program they couldn't.

Evert

What have I done!

Trent Gamblin

Some more numbers from randtest1 and 2:

        Intel A100 (atom predecessor) | Core2 Quad 2.4Ghz
test 1      5.3 seconds               |      1.3 seconds 
test 2      5.3 seconds               |      1.1 seconds

The A100 is about the same on both tests, while the older still AMD I have is faster on the 1 byte bool test. Everything else I've tried is faster on bitfields. Seems like newer CPUs are better at bitfields than older ones.

Oscar Giner

Oh, when I changed the code to fix the mistake I didn't rerun it (I was in a hurry, and I didn't expect it to change the results).

The change was this:

 if (bools[i>>3] & (0x80 >> (i&0x8))) count++;

fixed to:

 if (bools[i>>3] & (0x80 >> (i&0x7))) count++;

That simple change made the time go from 0.14 to 0.42 I guess the first one was optimized to a shift :/

Trent Gamblin said:

I think enabling optimizing in the compiler is not really fair since these tests are so simple and can be optimized away to nothing while in a larger program they couldn't.

The optimizer can't really remove anything from these tests, they're ok. I think optimizations need to be on to be able to compare how it performs in a real scenario (the bitfield version can benefit from optimizations, unless you do the implementation yourself in assembler).

Jonatan Hedborg

Oscar Giner said:

I guess the first one was optimized to a shift :/

I think it evaluated to "true" a lot less, resulting in fewer "count++" being executed.

Thread #604365. Printed from Allegro.cc