Discussion Thursdays?

Sol Blast

Good Morning! (From Ireland anyway.)

I've had this idea for a thread for a while, ignore it if you don't think it's a good one.
Basically, with all the valuable information I've picked up about concepts of C/C++ on this forum in assorted threads that I'd of otherwise have had no interest to actively look up, I've been left somewhat blood thirsty for more.
I thought it might be an idea for a thread like this one to come along and choose a topic/ concept of programming, for any member to enter and share their knowledge, experiences, warnings, tips, how it works internally and so on. We discuss that topic in depth to the exclusion of all others for the next week, and then choose another.
Every Thursday the topic will change decided by suggestions from those who've contributed.

Personally I think it'd be a great way for everyone to expand their knowledge and learn about things it never occurred to them to learn about before.

I'll get the ball rolling for this week by suggesting we discuss... Bitwise Operators? I'll leave it to the first proper contributor to choose the definite topic if they have a better idea though.
I myself am too much of a newbie to have anything useful to add of course, but I look forward to reading your replies .

Thomas Harte

Ummm, not much to say really...

All numbers are stored as a binary code, i.e. in base 2.

If you bitwise OR two bits then the result is set iff either of the two input bits were set.

If you bitwise AND two bits then the result is set iff both of the input bits were set.

If you bitwise EXCLUSIVE-OR two bits then the result is set iff exactly one of the input bits was set.

The bitwise NOT of a single bit is that bit inverted.

The effect of carrying out a bitwise operation using integers is that the bitwise operation is carried out on each digit separately.

The C operators are '|' for OR, '&' for AND, '^' for EXCLUSIVE-OR and '~' for NOT. |, & and ^ are binary operators, ~ is a unary operator.

Bitwise operations are often seen in C/C++ programs to represent flags. For example:

1#define FLAG_1    1
2#define FLAG_2    2
3#define FLAG_3    4 /* these are powers of two */
4<etc>
5 
6...
7 
8int FlagField = FLAG_1 | FLAG_2; // FlagField has flags 1 and 2 set
9 
10...
11 
12if(FlagField&FLAG_3)
13{
14   // this code is executed iff flag 3 is set
15}
16 
17if(FlagField&FLAG_2)
18{
19   // this code is executed iff flag 2 is set
20}
21 
22if(FlagField&(FLAG_1|FLAG_2))
23{
24   // this code is executed if either FLAG_1 or FLAG_2 is set
25}

Another common construction is this:

if(NewVar^OldVar&NewVar&FLAG_1)
{
   // executes only if flag 1 is set in NewVar but not in OldVar
}
OldVar = NewVar;

Which is fairly easy to understand. The EXCLUSIVE OR effectively produces a number with a bit set everywhere a bit has changed between OldVar and NewVar, the AND with NewVar keeps only those bits that have changed and are set in NewVar, and the AND with FLAG_1 throws away all those bits that aren't flag 1. Then the usual C rule of 'execute if expression is non-zero' takes the code into the if statement.

Ummm, I guess you also sometimes see:

a ^= b;
b ^= a;
a ^= b;

To do an in-place swap of the values of a and b. That works because binary operators are commutative and associative - b ^ a ^ b = a ^ b ^ b = a ^ (b ^ b) = a ^ 0 = a. And in that case we go from:

a = a, b = b; to
a = a^b, b = b; to
a = a^b, b = b^a^b; to
a = a^b^b^a^b, b = b^a^b

So at the end, a = a^b^b^a^b = (a^a) ^ (b^b) ^ b = 0 ^ 0 ^ b = b, and b = b ^ a ^ b = b ^ b ^ a = 0 ^ a = a.

It's rare that you need to do bitwise stuff other than to communicate with hardware at the lowest level, but lots of programmers with lower level experience put a lot of it around because it just matches how they think about things.

Probably lots more to say, but with these things it's all just directly consequential from the definition...

EDIT: P.s. neat idea, especially if someone is going to start transposing this stuff into the wiki?

alethiophile

The first ten or so times I saw the 'if and only if' construction 'iff', I thought it was an unusually persistent typo.

The main time I used bitwise operators was when trying to pack/unpack bytes from a long; I ended up using a union eventually.

Evert

I use bitwise operators a lot to check/set/clear bit-encoded flags, as TH stated. I tend to use hexadecimal notation rather than decimal when defining flags, mainly because it's more closely linked to the lower level bitwise encoding.

My main source of frustration is trying to use bit flags in FORTRAN. Let's just say it wasn't designed with bitwise operators in mind; it uses functions to provide similar functionality: BTEST, DSHIFTL, DSHIFTR, IAND, IBCHNG, IBCLR, IBITS, IBSET, IEOR, IOR, ISHA, ISHC, ISHFT, ISHFTC, ISHL, IXOR, LSHIFT, NOT, RSHIFT , SHIFTL, SHIFTR.
I often find myself wishing for the simplicity of bitwise operators in FORTRAN... they make the code look so much more compact and easier to follow.

SiegeLord

I either use octal (I find it easier to read the hexadecimal for these cases) when defining my flags, or bitwise shifts:

1enum Flags1
2{
3    flag1 = 01,
4    flag2 = 02,
5    flag3 = 04,
6    flag4 = 08,
7    flag5 = 010
8};
9 
10enum Flags2
11{
12    flag1 = 1 << 0,
13    flag2 = 1 << 1,
14    flag3 = 1 << 2,
15    flag4 = 1 << 3,
16    flag5 = 1 << 4
17};

bamccaig

SiegeLord said:

enum Flags1
{
    flag1 = 001,
    flag2 = 002,
    flag3 = 004,
    flag4 = 010,
    flag5 = 020
};

Fixed.

Peter Wang

What he was thinking in the first place:

enum Flags1
{
    flag1 = 0x1,
    flag2 = 0x2,
    flag3 = 0x4,
    flag4 = 0x8,
    flag5 = 0x10
};

SiegeLord

nvm, need more sleep.

Todd Cope

Quote:

The bitwise NOT of a single bit is that bit inverted.

How do you perform this in C? Can I just do:

flags ~= MYFLAG;

I need to toggle bits from time to time and have been using flags ^= MYFLAG; // edited, I put ~ by mistake but I don't really know if that is the correct way to do it.

Evert

Quote:

Can I just do:

flags ~= MYFLAG;

I need to toggle bits from time to time and have been using

flags ~= MYFLAG;

but I don't really know if that is the correct way to do it.

Only if you know MYFLAG is set.
The proper way would be flags &= ~MYFLAG.

Todd Cope

Quote:

The proper way would be flags &= ~MYFLAG.

Will that toggle MYFLAG on if it is not already on? Seems to me like it would only turn it off if it is on.

Edgar Reynaldo

To flip specific bits of a number , Bitwise XOR the number with a number that has only the bits you want to flip set to 1.

//
const int X = 1;
const int Y = 2;
const int Z = 4;

int a = X | Z;
int b = a ^ Y;
int c = X | Y | Z;
if (c == b) {XOR_1_Flips_Bits();}
//

1 XOR 0 is 1 because they are different
0 XOR 0 is 0 because they are the same
So if you XOR a bit with 0 it remains the same.

0 XOR 1 is 1 because they are different
1 XOR 1 is 0 because they are the same
So if you XOR a bit with 1 it is flipped.

Evert

Sorry, I got confused. To turn bits off (which I though was what you wanted to know), do what I said. To toggle them, logical xor is indeed the correct way to do it.

Edgar Reynaldo

I made a mistake in the code I posted - I used binary NOT (~) instead of XOR (^), and I compared a to c when I meant to compare b to c. It is corrected now.

On a different note, I thought XOR variable swapping was so cool when I first saw it, but it performs much more slowly than using temporary variables to swap the values.

TestIntSwapSpeed.cpp

1//
2 
3void IntXorSwap(int& A , int& B) {
4    A ^= B;// A now holds the difference between the original A and B
5    B ^= A;// B now holds the original value of A since B = (B^A which equals A^B which equals {originalA ^ B}^B )
6    A ^= B;// A now holds the original value of B since A = (A^B which equals B^A) and since A held the original difference between A and B
7}
8void IntTempSwap(int& A , int& B) {
9  int temp = A;
10  A = B;
11  B = temp;
12}
13 
14void TestXorSwap(int a , int b , unsigned int ntests) {
15   for (unsigned int i = 0 ; i < ntests ; ++i) {
16      IntXorSwap(a,b);
17   }
18}
19 
20void TestTempSwap(int a , int b , unsigned int ntests) {
21   for (unsigned int i = 0 ; i < ntests ; ++i) {
22      IntTempSwap(a,b);
23   }
24}
25 
26int main() {
27  
28  int inta = -75 , intb = 14011;
29  const unsigned int num_tests = 500000000;//500 million
30  
31  TestXorSwap(inta , intb , num_tests);
32  TestTempSwap(inta , intb , num_tests);
33  
34  return 0;
35}
36 
37//

Profiling results :

Unoptimized

mingw32-g++ -Wall -pg -o TestIntSwapSpeed.exe TestIntSwapSpeed.cpp

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls   s/call   s/call  name    
 31.05      3.94     3.94 500000000     0.00     0.00  IntXorSwap(int&, int&)
 24.27      7.02     3.08 500000000     0.00     0.00  IntTempSwap(int&, int&)
 22.62      9.89     2.87        1     2.87     5.95  TestTempSwap(int, int, unsigned int)
 22.06     12.69     2.80        1     2.80     6.74  TestXorSwap(int, int, unsigned int)

Optimized with -O2

mingw32-g++ -Wall -pg -O2 -o TestIntSwapSpeed.exe TestIntSwapSpeed.cpp

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls   s/call   s/call  name    
 49.13      3.10     3.10 500000000     0.00     0.00  IntXorSwap(int&, int&)
 27.42      4.83     1.73 500000000     0.00     0.00  IntTempSwap(int&, int&)
 11.89      5.58     0.75        1     0.75     2.48  TestTempSwap(int, int, unsigned int)
 11.57      6.31     0.73        1     0.73     3.83  TestXorSwap(int, int, unsigned int)

So unoptimized, 500,000,000 swaps using XOR took 3.94 seconds and using a temp variable took 3.08 seconds.
Optimized, 500,000,000 swaps using XOR took 3.10 seconds and using a temp variable took 1.73 seconds.

Percentage wise, unoptimized XOR swapping takes ~128% as much time as temp swapping, and optimized XOR swapping takes 180% as much time as temp swapping for integers.

Here's the assembly code MinGW produces when there's no profiling :

Unoptimized :

mingw32-g++ -Wall -S -o TestIntSwapSpeed_asm.txt TestIntSwapSpeed.cpp

.file  "TestIntSwapSpeed.cpp"
.text
.align 2
globl __Z10IntXorSwapRiS_
.def  __Z10IntXorSwapRiS_;  .scl  2;  .type  32;  .endef
6__Z10IntXorSwapRiS_:
pushl  %ebp
movl  %esp, %ebp
movl  8(%ebp), %ecx
movl  8(%ebp), %edx
movl  12(%ebp), %eax
movl  (%eax), %eax
xorl  (%edx), %eax
movl  %eax, (%ecx)
movl  12(%ebp), %ecx
movl  12(%ebp), %edx
movl  8(%ebp), %eax
movl  (%eax), %eax
xorl  (%edx), %eax
movl  %eax, (%ecx)
movl  8(%ebp), %ecx
movl  8(%ebp), %edx
movl  12(%ebp), %eax
movl  (%eax), %eax
xorl  (%edx), %eax
movl  %eax, (%ecx)
popl  %ebp
ret
.align 2
globl __Z11IntTempSwapRiS_
.def  __Z11IntTempSwapRiS_;  .scl  2;  .type  32;  .endef
32__Z11IntTempSwapRiS_:
pushl  %ebp
movl  %esp, %ebp
subl  $4, %esp
movl  8(%ebp), %eax
movl  (%eax), %eax
movl  %eax, -4(%ebp)
movl  8(%ebp), %edx
movl  12(%ebp), %eax
movl  (%eax), %eax
movl  %eax, (%edx)
movl  12(%ebp), %edx
movl  -4(%ebp), %eax
movl  %eax, (%edx)
leave
ret
.align 2
globl __Z11TestXorSwapiij
.def  __Z11TestXorSwapiij;  .scl  2;  .type  32;  .endef
51__Z11TestXorSwapiij:
pushl  %ebp
movl  %esp, %ebp
subl  $12, %esp
movl  $0, -4(%ebp)
56L4:
movl  -4(%ebp), %eax
cmpl  16(%ebp), %eax
jae  L3
leal  12(%ebp), %eax
movl  %eax, 4(%esp)
leal  8(%ebp), %eax
movl  %eax, (%esp)
call  __Z10IntXorSwapRiS_
leal  -4(%ebp), %eax
incl  (%eax)
jmp  L4
68L3:
leave
ret
.align 2
globl __Z12TestTempSwapiij
.def  __Z12TestTempSwapiij;  .scl  2;  .type  32;  .endef
74__Z12TestTempSwapiij:
pushl  %ebp
movl  %esp, %ebp
subl  $12, %esp
movl  $0, -4(%ebp)
79L8:
movl  -4(%ebp), %eax
cmpl  16(%ebp), %eax
jae  L7
leal  12(%ebp), %eax
movl  %eax, 4(%esp)
leal  8(%ebp), %eax
movl  %eax, (%esp)
call  __Z11IntTempSwapRiS_
leal  -4(%ebp), %eax
incl  (%eax)
jmp  L8
91L7:
leave
ret
.def  ___main;  .scl  2;  .type  32;  .endef
.align 2
globl _main
.def  _main;  .scl  2;  .type  32;  .endef
98_main:
pushl  %ebp
movl  %esp, %ebp
subl  $40, %esp
andl  $-16, %esp
movl  $0, %eax
addl  $15, %eax
addl  $15, %eax
shrl  $4, %eax
sall  $4, %eax
movl  %eax, -16(%ebp)
movl  -16(%ebp), %eax
call  __alloca
call  ___main
movl  $-75, -4(%ebp)
movl  $14011, -8(%ebp)
movl  $500000000, -12(%ebp)
movl  $500000000, 8(%esp)
movl  -8(%ebp), %eax
movl  %eax, 4(%esp)
movl  -4(%ebp), %eax
movl  %eax, (%esp)
call  __Z11TestXorSwapiij
movl  $500000000, 8(%esp)
movl  -8(%ebp), %eax
movl  %eax, 4(%esp)
movl  -4(%ebp), %eax
movl  %eax, (%esp)
call  __Z12TestTempSwapiij
movl  $0, %eax
leave
ret

Optimized :

mingw32-g++ -Wall -O2 -S -o TestIntSwapSpeed_O2_asm.txt TestIntSwapSpeed.cpp

.file  "TestIntSwapSpeed.cpp"
.text
.align 2
.p2align 4,,15
globl __Z10IntXorSwapRiS_
.def  __Z10IntXorSwapRiS_;  .scl  2;  .type  32;  .endef
7__Z10IntXorSwapRiS_:
pushl  %ebp
movl  %esp, %ebp
movl  12(%ebp), %edx
movl  8(%ebp), %ecx
movl  (%edx), %eax
xorl  (%ecx), %eax
movl  %eax, (%ecx)
xorl  (%edx), %eax
movl  %eax, (%edx)
xorl  %eax, (%ecx)
popl  %ebp
ret
.align 2
.p2align 4,,15
globl __Z11IntTempSwapRiS_
.def  __Z11IntTempSwapRiS_;  .scl  2;  .type  32;  .endef
24__Z11IntTempSwapRiS_:
pushl  %ebp
movl  %esp, %ebp
movl  8(%ebp), %edx
pushl  %ebx
movl  12(%ebp), %ecx
movl  (%edx), %ebx
movl  (%ecx), %eax
movl  %eax, (%edx)
movl  %ebx, (%ecx)
popl  %ebx
popl  %ebp
ret
.align 2
.p2align 4,,15
globl __Z11TestXorSwapiij
.def  __Z11TestXorSwapiij;  .scl  2;  .type  32;  .endef
41__Z11TestXorSwapiij:
pushl  %ebp
movl  %esp, %ebp
subl  $20, %esp
movl  %edi, -4(%ebp)
movl  16(%ebp), %edi
movl  %ebx, -12(%ebp)
xorl  %ebx, %ebx
cmpl  %edi, %ebx
movl  %esi, -8(%ebp)
jae  L9
leal  12(%ebp), %esi
.p2align 4,,15
54L7:
movl  %esi, 4(%esp)
leal  8(%ebp), %eax
incl  %ebx
movl  %eax, (%esp)
call  __Z10IntXorSwapRiS_
cmpl  %edi, %ebx
jb  L7
62L9:
movl  -12(%ebp), %ebx
movl  -8(%ebp), %esi
movl  -4(%ebp), %edi
movl  %ebp, %esp
popl  %ebp
ret
.align 2
.p2align 4,,15
globl __Z12TestTempSwapiij
.def  __Z12TestTempSwapiij;  .scl  2;  .type  32;  .endef
73__Z12TestTempSwapiij:
pushl  %ebp
movl  %esp, %ebp
subl  $20, %esp
movl  %edi, -4(%ebp)
movl  16(%ebp), %edi
movl  %ebx, -12(%ebp)
xorl  %ebx, %ebx
cmpl  %edi, %ebx
movl  %esi, -8(%ebp)
jae  L17
leal  12(%ebp), %esi
.p2align 4,,15
86L15:
movl  %esi, 4(%esp)
leal  8(%ebp), %eax
incl  %ebx
movl  %eax, (%esp)
call  __Z11IntTempSwapRiS_
cmpl  %edi, %ebx
jb  L15
94L17:
movl  -12(%ebp), %ebx
movl  -8(%ebp), %esi
movl  -4(%ebp), %edi
movl  %ebp, %esp
popl  %ebp
ret
.def  ___main;  .scl  2;  .type  32;  .endef
.align 2
.p2align 4,,15
globl _main
.def  _main;  .scl  2;  .type  32;  .endef
106_main:
pushl  %ebp
movl  $16, %eax
movl  %esp, %ebp
subl  $24, %esp
andl  $-16, %esp
call  __alloca
call  ___main
movl  $-75, (%esp)
movl  $500000000, %eax
movl  $14011, %ecx
movl  %eax, 8(%esp)
movl  %ecx, 4(%esp)
call  __Z11TestXorSwapiij
movl  $-75, (%esp)
movl  $14011, %eax
movl  $500000000, %edx
movl  %eax, 4(%esp)
movl  %edx, 8(%esp)
call  __Z12TestTempSwapiij
leave
xorl  %eax, %eax
ret

Assembly operation tallies :

Version                           pushl  popl  movl  xorl  subl leave   total
-----------------------------------------------------------------------------------
Unoptimized
XOR swap  (__Z10IntXorSwapRiS_)       1     1    16     3     0     0      21
Temp swap (__Z11IntTempSwapRiS_)      1     0    11     0     1     1      14     

Optimized
XOR swap  (__Z10IntXorSwapRiS_)       1     1     6     3     0     0      11
Temp swap (__Z11IntTempSwapRiS_)      2     2     7     0     0     0      11

The xorl operation must be pretty slow if both optimized versions take the same number of operations but the XOR swap takes 9/5 as long to complete.

anonymous

The xor-swap is also missing an identity check (it assumes it has two variables for storage and won't work if A and B refer to the same variable).

void IntXorSwap(int& A , int& B) 
{
    if (&A != &B) {
        A ^= B;
        B ^= A;
        A ^= B;
    }
}

Edgar Reynaldo

You're right. The XOR method will be even slower then.

Unoptimized :

mingw32-g++ -Wall -pg -o TestIntSwapSpeed.exe TestIntSwapSpeed.cpp

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls   s/call   s/call  name    
 34.43      4.61     4.61 500000000     0.00     0.00  IntXorSwap(int&, int&)
 23.30      7.73     3.12 500000000     0.00     0.00  IntTempSwap(int&, int&)
 22.03     10.68     2.95        1     2.95     7.56  TestXorSwap(int, int, unsigned int)
 20.24     13.39     2.71        1     2.71     5.83  TestTempSwap(int, int, unsigned int)

Optimized with -O2

mingw32-g++ -Wall -O2 -pg -o TestIntSwapSpeed.exe TestIntSwapSpeed.cpp

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls   s/call   s/call  name    
 53.09      2.92     2.92 500000000     0.00     0.00  IntXorSwap(int&, int&)
 26.18      4.36     1.44 500000000     0.00     0.00  IntTempSwap(int&, int&)
 13.45      5.10     0.74        1     0.74     2.18  TestTempSwap(int, int, unsigned int)
  7.27      5.50     0.40        1     0.40     3.32  TestXorSwap(int, int, unsigned int)

Execution time ratio for updated XOR method to temp method :
Unoptimized : ~148%
Optimized : ~203%

Assembly output
(unoptimized and optimized versions of _Z10IntXorSwapRiS together)

1//
2// unoptimized
globl __Z10IntXorSwapRiS_
.def  __Z10IntXorSwapRiS_;  .scl  2;  .type  32;  .endef
5__Z10IntXorSwapRiS_:
pushl  %ebp
movl  %esp, %ebp
movl  8(%ebp), %eax
cmpl  12(%ebp), %eax
je  L1
movl  8(%ebp), %ecx
movl  8(%ebp), %edx
movl  12(%ebp), %eax
movl  (%eax), %eax
xorl  (%edx), %eax
movl  %eax, (%ecx)
movl  12(%ebp), %ecx
movl  12(%ebp), %edx
movl  8(%ebp), %eax
movl  (%eax), %eax
xorl  (%edx), %eax
movl  %eax, (%ecx)
movl  8(%ebp), %ecx
movl  8(%ebp), %edx
movl  12(%ebp), %eax
movl  (%eax), %eax
xorl  (%edx), %eax
movl  %eax, (%ecx)
29L1:
popl  %ebp
ret
.align 2
33 
34// optimized
globl __Z10IntXorSwapRiS_
.def  __Z10IntXorSwapRiS_;  .scl  2;  .type  32;  .endef
37__Z10IntXorSwapRiS_:
pushl  %ebp
movl  %esp, %ebp
movl  8(%ebp), %ecx
movl  12(%ebp), %edx
cmpl  %edx, %ecx
je  L1
movl  (%edx), %eax
xorl  (%ecx), %eax
movl  %eax, (%ecx)
xorl  (%edx), %eax
movl  %eax, (%edx)
xorl  %eax, (%ecx)
50L1:
popl  %ebp
ret
.align 2
.p2align 4,,15
55 
56//

Tallies :

Version                           pushl  popl  cmpl  movl  xorl    je   total
-----------------------------------------------------------------------------------
Unoptimized
XOR swap  (__Z10IntXorSwapRiS_)       1     1     1    17     3     1      24
Optimized
XOR swap  (__Z10IntXorSwapRiS_)       1     1     1     6     3     1      13

It's interesting to see that the -O2 version was able to trim off 11 movl operations. Is that the difference between keeping the data in the registers and checking it from memory?

Thomas Harte

Quote:

On a different note, I thought XOR variable swapping was so cool when I first saw it, but it performs much more slowly than using temporary variables to swap the values.

Yep, it long ago stopped being a fast trick and is now just something worth mentioning to ensure that everyone's on the same page with respect to the commutative and associative properties of bitwise operators.

You will sometimes still see people using it, but as with a lot of things, it's mostly personal taste and probably marks them out as an older programmer; on something like a z80 where xors cost exactly the same as inter-register loads, three xors superficially costs the same in cycles but saves a valuable register. On modern x86 I think we all know that the 'registers' aren't actually registers but rather merely a shorthand to describe how subsequent values relate to earlier ones and the bottlenecks that actually cost are completely unrelated to how quickly you can swap values for any application that anybody actually cares about.

Evert

How about "a^=a" versus "a=0"? Is the former still faster? I can't imagine it is.

Quote:

as with a lot of things, it's mostly personal taste and probably marks them out as an older programmer

A preference for obfuscated code? Hmm...

Quote:

On modern x86 I think we all know that the 'registers' aren't actually registers but rather merely a shorthand to describe how subsequent values relate to earlier ones

I didn't know! Do you have a link where this is discussed in a bit more detail?

Thomas Harte

Quote:

I didn't know! Do you have a link where this is discussed in a bit more detail?

Ummm, I was really just making oblique reference to the combination of feeding multiple simultaneous pipelines through data dependency calculation and, I guess, a little bit to caching. So if a CPU spots that registers A, B and C are all being used to do one calculation then D, E and F are used to do another that doesn't depend on the results of the A, B, C calculation then it can often do the A, B and C calculation simultaneously to the D, E and F calculation and not have to worry about the two strands being in synchronisation until something tries to combine data from both streams.

Add that to processors like the Pentium Pro on which the first-level cache is operating at the same speed as the CPU and you're really fighting to find a reason that the registers are any more significant than any of the other temporary stores hanging around a chip. Especially where hyperthreading making it less logical to have any sort of primary source of data for calculations.

That's before you start discussing how microcoding often maps the x86 instruction set to a more RISCy internal set for the actual processing unit that tends not to have a 1:1 correlation to the named x86 registers.

I would therefore distinguish modern registers from the registers of, e.g. the z80, because you can no longer say from looking at your assembly exactly what the state of the named registers is after every instruction. Nowadays you just work on being certain that the state would be reported as a certain thing were you to query it. Which is obviously compatible with the old model but completely changes the field of optimisation.

I'm talking generally because I did a general degree way back when and have not subsequently interested myself with specifics.

EDIT:

Quote:

A preference for obfuscated code? Hmm...

I don't think it's obfuscated for them. It's just part of the way the brain recognises things; like the way it recognises sentence structures and can fix bits sometimes without you being consciously aware (which is annoying when you're an editor) - they're used to being able to instantly spot that a certain pattern does a certain thing to the extent where 'a ^= b; b ^= a; a ^= b;' just reads instantly as 'swap a and b', not 'a exclusive or with b, store result to a; ...'.

I was really just trying to say "you may see this around still, but it's not some sort of magical recipe and don't bother adapting yourself to it now".

alethiophile

I like xor-swap because it's easy to implement as a CPP macro--no passing pointers to a function and no temp variable. The only place I've ever used bitshifts and bitwise things like that is in a very simplistic implementation of ARC4; I'm not doing things like that much anymore.

Evert

Quote:

I was really just making oblique reference to the combination of feeding multiple simultaneous pipelines through data dependency calculation and, I guess, a little bit to caching. So if a CPU spots that registers A, B and C are all being used to do one calculation then D, E and F are used to do another that doesn't depend on the results of the A, B, C calculation then it can often do the A, B and C calculation simultaneously to the D, E and F calculation and not have to worry about the two strands being in synchronisation until something tries to combine data from both streams.

Ah, yes! That makes sense, actually. I'd never thought about this (in terms of assembler language and processor architecture I never really went much beyond the 486 although I know (knew) a few basic things about the AMD64).

Quote:

I'm talking generally because I did a general degree way back when and have not subsequently interested myself with specifics.

That's fine, this is exactly the right level of explanation that I was looking for. Thanks.

Vanneto

Quote:

I like xor-swap because it's easy to implement as a CPP macro--no passing pointers to a function and no temp variable.

If you create a function, there are no temp variables (that you see or care about). I personally always use the std functions wherever possible:

int a = 1;
int b = 2;

std::swap (a, b);

Roy Underthump

Quote:

if a CPU spots that registers A, B and C are all being used to do one calculation then D, E and F are used to do another that doesn't depend on the results of the A, B, C calculation then it can often do the A, B and C calculation simultaneously to the D, E and F calculation

Registers themselves have "shadow copies" (up to 128 for eax) so that if your lame compiler uses some particular register to do one set of operations, then uses it immediately afterward to do some other independant set of ops, it can use a shadow register to do the second set of ops before the first set has finished.

alethiophile

How does that std::swap work? It shouldn't be able to swap variables in the calling context without getting pointers to them, unless C++ is much weirder than I thought.

Vanneto

Ever heard of references?

void change (int &var)
{
    var++;
}

int main ()
{
   int var = 1;
   change(var);

   // var is now 2! \o/
}

alethiophile

Actually I haven't. I don't use C++; I only know it to the extent that it's based on C.

Vanneto

Well if you are interested here is a nice entry in C++ FAQ Lite.

Edgar Reynaldo

Quote:

How does that std::swap work? It shouldn't be able to swap variables in the calling context without getting pointers to them, unless C++ is much weirder than I thought.

A reference works like a pointer, but acts like an object.

mingw/include/c++/3.4.5/bits/stl_algobase.h lines 114 - 133

1//
/**
 *  @brief Swaps two values.
 *  @param  a  A thing of arbitrary type.
 *  @param  b  Another thing of arbitrary type.
 *  @return   Nothing.
 *
 *  This is the simple classic generic implementation.  It will work on
 *  any type which has a copy constructor and an assignment operator.
*/
template<typename _Tp>
  inline void
  swap(_Tp& __a, _Tp& __b)
  {
    // concept requirements
    __glibcxx_function_requires(_SGIAssignableConcept<_Tp>)
17 
    const _Tp __tmp = __a;
    __a = __b;
    __b = __tmp;
  }
22 
23//

That's just one of many swaps though, but it's possible they mostly just use that one in varying ways - grep for swap in your compiler include directory.

Wetimer

One way to use bits in numbers is as a set. Suppose that you have ten items and want to represent subsets of those items. Simply use the first ten bits of an int, where 1 means its included and 0 means it is not included.

To check if a particular item is included

subset_flags & (1 << index)

Flip it using

subset_flag ^= (1 << index)

However, the neatest part is iteration

for(int subset = 0; subset < ( 1 << count_of_items_in_set);subset++)

This will iterate through all possible subsets. But due to nature of order through which it will go through the possiblities, for every set under consideration, all possible subsets will have already undergone consideration.

I.e. for the subset

1101

All of the following are before it:

1100
1001
0101
0001

Sol Blast

Topic's been dead for a few days I know, but I'll stick to with the plan anyway .

There was alot of good reading in this so far. It seems that a few folk learnt something, so the thread is working .
Cheers to anyone who contributed this week.
I was intrigued by Thomas Harte's idea of putting this stuff into the wiki if the powers that be feel the information inputted here is valuable enough so far?
If so, I'd be happy to type it up in article form myself if no-one else would rather do it (giving credit to the original contributors of course ).

Either way however, the topic changes tomorrow. Has anyone got any suggestion for a new one they feel there's plenty of dicussion potential in?

weapon_S

Maybe you already noticed I tried to make a wiki entry under "bitwise operators".
But now I can't log-in anymore. I attached my second draft version you might use to complete it.(Notice I dubbed it "Thursday's topic".)
The reason last topic went so well however, is that there's little discussion about these things. And there is verifiable information. If I suggested "project(file)s" or "Open source game development", your average discussion chaos would occur, procuring any effort to transpose it into the wiki.
Oh, what also would be nice if you updated (in the future) your first post to have an index.

Sol Blast

Ahh, I seen that topic but i never noticed it was related to this topic haha.
I can't check our your attachment from where I am right now, but I'll have a look tonight and see if i can finish it off/ post it on the wiki if you're cool with that .
About the subject matter, i think you're right. I'd love to hear what people have to say about proper OOP (I just can't get it right! ), but the subject is covered in depth in so many places that i don't think anyone would really feel it's worth it to offer their two cents.

One idea I had been thinking about for discussion this week is function pointers. Any objections or better ideas?

Thomas Harte

Will you be just carrying on this thread or starting a new one?

Without wanting to say anything of them before Thursday, I can think of at least five things to say about function pointers...

Sol Blast

Wonderful. I look forward to hearing them .

I wasn't so sure about whether to start a new thread or not though.
On one hand it would be clearer, although on the other it may be regarded as 'spammy'.

Vanneto

1 Thread once a week is definitely not spammy. Open up a new thread to avoid confusion and mixing up subjects.

Sol Blast

Thy will be done .

Thread #598576. Printed from Allegro.cc

1	#define FLAG_1 1
2	#define FLAG_2 2
3	#define FLAG_3 4 /* these are powers of two */
4	<etc>
5
6	...
7
8	int FlagField = FLAG_1 \| FLAG_2; // FlagField has flags 1 and 2 set
9
10	...
11
12	if(FlagField&FLAG_3)
13	{
14	// this code is executed iff flag 3 is set
15	}
16
17	if(FlagField&FLAG_2)
18	{
19	// this code is executed iff flag 2 is set
20	}
21
22	if(FlagField&(FLAG_1\|FLAG_2))
23	{
24	// this code is executed if either FLAG_1 or FLAG_2 is set
25	}

1	enum Flags1
2	{
3	flag1 = 01,
4	flag2 = 02,
5	flag3 = 04,
6	flag4 = 08,
7	flag5 = 010
8	};
9
10	enum Flags2
11	{
12	flag1 = 1 << 0,
13	flag2 = 1 << 1,
14	flag3 = 1 << 2,
15	flag4 = 1 << 3,
16	flag5 = 1 << 4
17	};

1	//
2
3	void IntXorSwap(int& A , int& B) {
4	A ^= B;// A now holds the difference between the original A and B
5	B ^= A;// B now holds the original value of A since B = (B^A which equals A^B which equals {originalA ^ B}^B )
6	A ^= B;// A now holds the original value of B since A = (A^B which equals B^A) and since A held the original difference between A and B
7	}
8	void IntTempSwap(int& A , int& B) {
9	int temp = A;
10	A = B;
11	B = temp;
12	}
13
14	void TestXorSwap(int a , int b , unsigned int ntests) {
15	for (unsigned int i = 0 ; i < ntests ; ++i) {
16	IntXorSwap(a,b);
17	}
18	}
19
20	void TestTempSwap(int a , int b , unsigned int ntests) {
21	for (unsigned int i = 0 ; i < ntests ; ++i) {
22	IntTempSwap(a,b);
23	}
24	}
25
26	int main() {
27
28	int inta = -75 , intb = 14011;
29	const unsigned int num_tests = 500000000;//500 million
30
31	TestXorSwap(inta , intb , num_tests);
32	TestTempSwap(inta , intb , num_tests);
33
34	return 0;
35	}
36
37	//

1	.file "TestIntSwapSpeed.cpp"
2	.text
3	.align 2
4	.globl __Z10IntXorSwapRiS_
5	.def __Z10IntXorSwapRiS_; .scl 2; .type 32; .endef
6	__Z10IntXorSwapRiS_:
7	pushl %ebp
8	movl %esp, %ebp
9	movl 8(%ebp), %ecx
10	movl 8(%ebp), %edx
11	movl 12(%ebp), %eax
12	movl (%eax), %eax
13	xorl (%edx), %eax
14	movl %eax, (%ecx)
15	movl 12(%ebp), %ecx
16	movl 12(%ebp), %edx
17	movl 8(%ebp), %eax
18	movl (%eax), %eax
19	xorl (%edx), %eax
20	movl %eax, (%ecx)
21	movl 8(%ebp), %ecx
22	movl 8(%ebp), %edx
23	movl 12(%ebp), %eax
24	movl (%eax), %eax
25	xorl (%edx), %eax
26	movl %eax, (%ecx)
27	popl %ebp
28	ret
29	.align 2
30	.globl __Z11IntTempSwapRiS_
31	.def __Z11IntTempSwapRiS_; .scl 2; .type 32; .endef
32	__Z11IntTempSwapRiS_:
33	pushl %ebp
34	movl %esp, %ebp
35	subl $4, %esp
36	movl 8(%ebp), %eax
37	movl (%eax), %eax
38	movl %eax, -4(%ebp)
39	movl 8(%ebp), %edx
40	movl 12(%ebp), %eax
41	movl (%eax), %eax
42	movl %eax, (%edx)
43	movl 12(%ebp), %edx
44	movl -4(%ebp), %eax
45	movl %eax, (%edx)
46	leave
47	ret
48	.align 2
49	.globl __Z11TestXorSwapiij
50	.def __Z11TestXorSwapiij; .scl 2; .type 32; .endef
51	__Z11TestXorSwapiij:
52	pushl %ebp
53	movl %esp, %ebp
54	subl $12, %esp
55	movl $0, -4(%ebp)
56	L4:
57	movl -4(%ebp), %eax
58	cmpl 16(%ebp), %eax
59	jae L3
60	leal 12(%ebp), %eax
61	movl %eax, 4(%esp)
62	leal 8(%ebp), %eax
63	movl %eax, (%esp)
64	call __Z10IntXorSwapRiS_
65	leal -4(%ebp), %eax
66	incl (%eax)
67	jmp L4
68	L3:
69	leave
70	ret
71	.align 2
72	.globl __Z12TestTempSwapiij
73	.def __Z12TestTempSwapiij; .scl 2; .type 32; .endef
74	__Z12TestTempSwapiij:
75	pushl %ebp
76	movl %esp, %ebp
77	subl $12, %esp
78	movl $0, -4(%ebp)
79	L8:
80	movl -4(%ebp), %eax
81	cmpl 16(%ebp), %eax
82	jae L7
83	leal 12(%ebp), %eax
84	movl %eax, 4(%esp)
85	leal 8(%ebp), %eax
86	movl %eax, (%esp)
87	call __Z11IntTempSwapRiS_
88	leal -4(%ebp), %eax
89	incl (%eax)
90	jmp L8
91	L7:
92	leave
93	ret
94	.def ___main; .scl 2; .type 32; .endef
95	.align 2
96	.globl _main
97	.def _main; .scl 2; .type 32; .endef
98	_main:
99	pushl %ebp
100	movl %esp, %ebp
101	subl $40, %esp
102	andl $-16, %esp
103	movl $0, %eax
104	addl $15, %eax
105	addl $15, %eax
106	shrl $4, %eax
107	sall $4, %eax
108	movl %eax, -16(%ebp)
109	movl -16(%ebp), %eax
110	call __alloca
111	call ___main
112	movl $-75, -4(%ebp)
113	movl $14011, -8(%ebp)
114	movl $500000000, -12(%ebp)
115	movl $500000000, 8(%esp)
116	movl -8(%ebp), %eax
117	movl %eax, 4(%esp)
118	movl -4(%ebp), %eax
119	movl %eax, (%esp)
120	call __Z11TestXorSwapiij
121	movl $500000000, 8(%esp)
122	movl -8(%ebp), %eax
123	movl %eax, 4(%esp)
124	movl -4(%ebp), %eax
125	movl %eax, (%esp)
126	call __Z12TestTempSwapiij
127	movl $0, %eax
128	leave
129	ret

1	//
2	// unoptimized
3	.globl __Z10IntXorSwapRiS_
4	.def __Z10IntXorSwapRiS_; .scl 2; .type 32; .endef
5	__Z10IntXorSwapRiS_:
6	pushl %ebp
7	movl %esp, %ebp
8	movl 8(%ebp), %eax
9	cmpl 12(%ebp), %eax
10	je L1
11	movl 8(%ebp), %ecx
12	movl 8(%ebp), %edx
13	movl 12(%ebp), %eax
14	movl (%eax), %eax
15	xorl (%edx), %eax
16	movl %eax, (%ecx)
17	movl 12(%ebp), %ecx
18	movl 12(%ebp), %edx
19	movl 8(%ebp), %eax
20	movl (%eax), %eax
21	xorl (%edx), %eax
22	movl %eax, (%ecx)
23	movl 8(%ebp), %ecx
24	movl 8(%ebp), %edx
25	movl 12(%ebp), %eax
26	movl (%eax), %eax
27	xorl (%edx), %eax
28	movl %eax, (%ecx)
29	L1:
30	popl %ebp
31	ret
32	.align 2
33
34	// optimized
35	.globl __Z10IntXorSwapRiS_
36	.def __Z10IntXorSwapRiS_; .scl 2; .type 32; .endef
37	__Z10IntXorSwapRiS_:
38	pushl %ebp
39	movl %esp, %ebp
40	movl 8(%ebp), %ecx
41	movl 12(%ebp), %edx
42	cmpl %edx, %ecx
43	je L1
44	movl (%edx), %eax
45	xorl (%ecx), %eax
46	movl %eax, (%ecx)
47	xorl (%edx), %eax
48	movl %eax, (%edx)
49	xorl %eax, (%ecx)
50	L1:
51	popl %ebp
52	ret
53	.align 2
54	.p2align 4,,15
55
56	//

1	//
2	/**
3	* @brief Swaps two values.
4	* @param a A thing of arbitrary type.
5	* @param b Another thing of arbitrary type.
6	* @return Nothing.
7	*
8	* This is the simple classic generic implementation. It will work on
9	* any type which has a copy constructor and an assignment operator.
10	*/
11	template<typename _Tp>
12	inline void
13	swap(_Tp& __a, _Tp& __b)
14	{
15	// concept requirements
16	__glibcxx_function_requires(_SGIAssignableConcept<_Tp>)
17
18	const _Tp __tmp = __a;
19	__a = __b;
20	__b = __tmp;
21	}
22
23	//