Good Morning! (From Ireland anyway.)
I've had this idea for a thread for a while, ignore it if you don't think it's a good one.
Basically, with all the valuable information I've picked up about concepts of C/C++ on this forum in assorted threads that I'd of otherwise have had no interest to actively look up, I've been left somewhat blood thirsty for more.
I thought it might be an idea for a thread like this one to come along and choose a topic/ concept of programming, for any member to enter and share their knowledge, experiences, warnings, tips, how it works internally and so on. We discuss that topic in depth to the exclusion of all others for the next week, and then choose another.
Every Thursday the topic will change decided by suggestions from those who've contributed.
Personally I think it'd be a great way for everyone to expand their knowledge and learn about things it never occurred to them to learn about before.
I'll get the ball rolling for this week by suggesting we discuss... Bitwise Operators? I'll leave it to the first proper contributor to choose the definite topic if they have a better idea though.
I myself am too much of a newbie to have anything useful to add of course, but I look forward to reading your replies
.
Ummm, not much to say really...
All numbers are stored as a binary code, i.e. in base 2.
If you bitwise OR two bits then the result is set iff either of the two input bits were set.
If you bitwise AND two bits then the result is set iff both of the input bits were set.
If you bitwise EXCLUSIVE-OR two bits then the result is set iff exactly one of the input bits was set.
The bitwise NOT of a single bit is that bit inverted.
The effect of carrying out a bitwise operation using integers is that the bitwise operation is carried out on each digit separately.
The C operators are '|' for OR, '&' for AND, '^' for EXCLUSIVE-OR and '~' for NOT. |, & and ^ are binary operators, ~ is a unary operator.
Bitwise operations are often seen in C/C++ programs to represent flags. For example:
| 1 | #define FLAG_1 1 |
| 2 | #define FLAG_2 2 |
| 3 | #define FLAG_3 4 /* these are powers of two */ |
| 4 | <etc> |
| 5 | |
| 6 | ... |
| 7 | |
| 8 | int FlagField = FLAG_1 | FLAG_2; // FlagField has flags 1 and 2 set |
| 9 | |
| 10 | ... |
| 11 | |
| 12 | if(FlagField&FLAG_3) |
| 13 | { |
| 14 | // this code is executed iff flag 3 is set |
| 15 | } |
| 16 | |
| 17 | if(FlagField&FLAG_2) |
| 18 | { |
| 19 | // this code is executed iff flag 2 is set |
| 20 | } |
| 21 | |
| 22 | if(FlagField&(FLAG_1|FLAG_2)) |
| 23 | { |
| 24 | // this code is executed if either FLAG_1 or FLAG_2 is set |
| 25 | } |
Another common construction is this:
if(NewVar^OldVar&NewVar&FLAG_1) { // executes only if flag 1 is set in NewVar but not in OldVar } OldVar = NewVar;
Which is fairly easy to understand. The EXCLUSIVE OR effectively produces a number with a bit set everywhere a bit has changed between OldVar and NewVar, the AND with NewVar keeps only those bits that have changed and are set in NewVar, and the AND with FLAG_1 throws away all those bits that aren't flag 1. Then the usual C rule of 'execute if expression is non-zero' takes the code into the if statement.
Ummm, I guess you also sometimes see:
a ^= b; b ^= a; a ^= b;
To do an in-place swap of the values of a and b. That works because binary operators are commutative and associative - b ^ a ^ b = a ^ b ^ b = a ^ (b ^ b) = a ^ 0 = a. And in that case we go from:
a = a, b = b; to
a = a^b, b = b; to
a = a^b, b = b^a^b; to
a = a^b^b^a^b, b = b^a^b
So at the end, a = a^b^b^a^b = (a^a) ^ (b^b) ^ b = 0 ^ 0 ^ b = b, and b = b ^ a ^ b = b ^ b ^ a = 0 ^ a = a.
It's rare that you need to do bitwise stuff other than to communicate with hardware at the lowest level, but lots of programmers with lower level experience put a lot of it around because it just matches how they think about things.
Probably lots more to say, but with these things it's all just directly consequential from the definition...
EDIT: P.s. neat idea, especially if someone is going to start transposing this stuff into the wiki?
The first ten or so times I saw the 'if and only if' construction 'iff', I thought it was an unusually persistent typo. 
The main time I used bitwise operators was when trying to pack/unpack bytes from a long; I ended up using a union eventually.
I use bitwise operators a lot to check/set/clear bit-encoded flags, as TH stated. I tend to use hexadecimal notation rather than decimal when defining flags, mainly because it's more closely linked to the lower level bitwise encoding.
My main source of frustration is trying to use bit flags in FORTRAN. Let's just say it wasn't designed with bitwise operators in mind; it uses functions to provide similar functionality: BTEST, DSHIFTL, DSHIFTR, IAND, IBCHNG, IBCLR, IBITS, IBSET, IEOR, IOR, ISHA, ISHC, ISHFT, ISHFTC, ISHL, IXOR, LSHIFT, NOT, RSHIFT , SHIFTL, SHIFTR.
I often find myself wishing for the simplicity of bitwise operators in FORTRAN... they make the code look so much more compact and easier to follow.
I either use octal (I find it easier to read the hexadecimal for these cases) when defining my flags, or bitwise shifts:
| 1 | enum Flags1 |
| 2 | { |
| 3 | flag1 = 01, |
| 4 | flag2 = 02, |
| 5 | flag3 = 04, |
| 6 | flag4 = 08, |
| 7 | flag5 = 010 |
| 8 | }; |
| 9 | |
| 10 | enum Flags2 |
| 11 | { |
| 12 | flag1 = 1 << 0, |
| 13 | flag2 = 1 << 1, |
| 14 | flag3 = 1 << 2, |
| 15 | flag4 = 1 << 3, |
| 16 | flag5 = 1 << 4 |
| 17 | }; |
enum Flags1 { flag1 = 001, flag2 = 002, flag3 = 004, flag4 = 010, flag5 = 020 };
Fixed.
What he was thinking in the first place:
enum Flags1 { flag1 = 0x1, flag2 = 0x2, flag3 = 0x4, flag4 = 0x8, flag5 = 0x10 };
nvm, need more sleep.
The bitwise NOT of a single bit is that bit inverted.
How do you perform this in C? Can I just do:
flags ~= MYFLAG;
I need to toggle bits from time to time and have been using flags ^= MYFLAG; // edited, I put ~ by mistake but I don't really know if that is the correct way to do it.
Can I just do:
flags ~= MYFLAG;
I need to toggle bits from time to time and have been using
flags ~= MYFLAG;
but I don't really know if that is the correct way to do it.
Only if you know MYFLAG is set.
The proper way would be flags &= ~MYFLAG.
The proper way would be flags &= ~MYFLAG.
Will that toggle MYFLAG on if it is not already on? Seems to me like it would only turn it off if it is on.
To flip specific bits of a number , Bitwise XOR the number with a number that has only the bits you want to flip set to 1.
// const int X = 1; const int Y = 2; const int Z = 4; int a = X | Z; int b = a ^ Y; int c = X | Y | Z; if (c == b) {XOR_1_Flips_Bits();} //
1 XOR 0 is 1 because they are different
0 XOR 0 is 0 because they are the same
So if you XOR a bit with 0 it remains the same.
0 XOR 1 is 1 because they are different
1 XOR 1 is 0 because they are the same
So if you XOR a bit with 1 it is flipped.
Sorry, I got confused. To turn bits off (which I though was what you wanted to know), do what I said. To toggle them, logical xor is indeed the correct way to do it.
I made a mistake in the code I posted - I used binary NOT (~) instead of XOR (^), and I compared a to c when I meant to compare b to c. It is corrected now.
On a different note, I thought XOR variable swapping was so cool when I first saw it, but it performs much more slowly than using temporary variables to swap the values.
TestIntSwapSpeed.cpp
| 1 | // |
| 2 | |
| 3 | void IntXorSwap(int& A , int& B) { |
| 4 | A ^= B;// A now holds the difference between the original A and B |
| 5 | B ^= A;// B now holds the original value of A since B = (B^A which equals A^B which equals {originalA ^ B}^B ) |
| 6 | A ^= B;// A now holds the original value of B since A = (A^B which equals B^A) and since A held the original difference between A and B |
| 7 | } |
| 8 | void IntTempSwap(int& A , int& B) { |
| 9 | int temp = A; |
| 10 | A = B; |
| 11 | B = temp; |
| 12 | } |
| 13 | |
| 14 | void TestXorSwap(int a , int b , unsigned int ntests) { |
| 15 | for (unsigned int i = 0 ; i < ntests ; ++i) { |
| 16 | IntXorSwap(a,b); |
| 17 | } |
| 18 | } |
| 19 | |
| 20 | void TestTempSwap(int a , int b , unsigned int ntests) { |
| 21 | for (unsigned int i = 0 ; i < ntests ; ++i) { |
| 22 | IntTempSwap(a,b); |
| 23 | } |
| 24 | } |
| 25 | |
| 26 | int main() { |
| 27 | |
| 28 | int inta = -75 , intb = 14011; |
| 29 | const unsigned int num_tests = 500000000;//500 million |
| 30 | |
| 31 | TestXorSwap(inta , intb , num_tests); |
| 32 | TestTempSwap(inta , intb , num_tests); |
| 33 | |
| 34 | return 0; |
| 35 | } |
| 36 | |
| 37 | // |
Profiling results :
Unoptimized
mingw32-g++ -Wall -pg -o TestIntSwapSpeed.exe TestIntSwapSpeed.cpp Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls s/call s/call name 31.05 3.94 3.94 500000000 0.00 0.00 IntXorSwap(int&, int&) 24.27 7.02 3.08 500000000 0.00 0.00 IntTempSwap(int&, int&) 22.62 9.89 2.87 1 2.87 5.95 TestTempSwap(int, int, unsigned int) 22.06 12.69 2.80 1 2.80 6.74 TestXorSwap(int, int, unsigned int)
Optimized with -O2
mingw32-g++ -Wall -pg -O2 -o TestIntSwapSpeed.exe TestIntSwapSpeed.cpp Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls s/call s/call name 49.13 3.10 3.10 500000000 0.00 0.00 IntXorSwap(int&, int&) 27.42 4.83 1.73 500000000 0.00 0.00 IntTempSwap(int&, int&) 11.89 5.58 0.75 1 0.75 2.48 TestTempSwap(int, int, unsigned int) 11.57 6.31 0.73 1 0.73 3.83 TestXorSwap(int, int, unsigned int)
So unoptimized, 500,000,000 swaps using XOR took 3.94 seconds and using a temp variable took 3.08 seconds.
Optimized, 500,000,000 swaps using XOR took 3.10 seconds and using a temp variable took 1.73 seconds.
Percentage wise, unoptimized XOR swapping takes ~128% as much time as temp swapping, and optimized XOR swapping takes 180% as much time as temp swapping for integers.
Here's the assembly code MinGW produces when there's no profiling :
Unoptimized :
mingw32-g++ -Wall -S -o TestIntSwapSpeed_asm.txt TestIntSwapSpeed.cpp
| 1 | .file "TestIntSwapSpeed.cpp" |
| 2 | .text |
| 3 | .align 2 |
| 4 | .globl __Z10IntXorSwapRiS_ |
| 5 | .def __Z10IntXorSwapRiS_; .scl 2; .type 32; .endef |
| 6 | __Z10IntXorSwapRiS_: |
| 7 | pushl %ebp |
| 8 | movl %esp, %ebp |
| 9 | movl 8(%ebp), %ecx |
| 10 | movl 8(%ebp), %edx |
| 11 | movl 12(%ebp), %eax |
| 12 | movl (%eax), %eax |
| 13 | xorl (%edx), %eax |
| 14 | movl %eax, (%ecx) |
| 15 | movl 12(%ebp), %ecx |
| 16 | movl 12(%ebp), %edx |
| 17 | movl 8(%ebp), %eax |
| 18 | movl (%eax), %eax |
| 19 | xorl (%edx), %eax |
| 20 | movl %eax, (%ecx) |
| 21 | movl 8(%ebp), %ecx |
| 22 | movl 8(%ebp), %edx |
| 23 | movl 12(%ebp), %eax |
| 24 | movl (%eax), %eax |
| 25 | xorl (%edx), %eax |
| 26 | movl %eax, (%ecx) |
| 27 | popl %ebp |
| 28 | ret |
| 29 | .align 2 |
| 30 | .globl __Z11IntTempSwapRiS_ |
| 31 | .def __Z11IntTempSwapRiS_; .scl 2; .type 32; .endef |
| 32 | __Z11IntTempSwapRiS_: |
| 33 | pushl %ebp |
| 34 | movl %esp, %ebp |
| 35 | subl $4, %esp |
| 36 | movl 8(%ebp), %eax |
| 37 | movl (%eax), %eax |
| 38 | movl %eax, -4(%ebp) |
| 39 | movl 8(%ebp), %edx |
| 40 | movl 12(%ebp), %eax |
| 41 | movl (%eax), %eax |
| 42 | movl %eax, (%edx) |
| 43 | movl 12(%ebp), %edx |
| 44 | movl -4(%ebp), %eax |
| 45 | movl %eax, (%edx) |
| 46 | leave |
| 47 | ret |
| 48 | .align 2 |
| 49 | .globl __Z11TestXorSwapiij |
| 50 | .def __Z11TestXorSwapiij; .scl 2; .type 32; .endef |
| 51 | __Z11TestXorSwapiij: |
| 52 | pushl %ebp |
| 53 | movl %esp, %ebp |
| 54 | subl $12, %esp |
| 55 | movl $0, -4(%ebp) |
| 56 | L4: |
| 57 | movl -4(%ebp), %eax |
| 58 | cmpl 16(%ebp), %eax |
| 59 | jae L3 |
| 60 | leal 12(%ebp), %eax |
| 61 | movl %eax, 4(%esp) |
| 62 | leal 8(%ebp), %eax |
| 63 | movl %eax, (%esp) |
| 64 | call __Z10IntXorSwapRiS_ |
| 65 | leal -4(%ebp), %eax |
| 66 | incl (%eax) |
| 67 | jmp L4 |
| 68 | L3: |
| 69 | leave |
| 70 | ret |
| 71 | .align 2 |
| 72 | .globl __Z12TestTempSwapiij |
| 73 | .def __Z12TestTempSwapiij; .scl 2; .type 32; .endef |
| 74 | __Z12TestTempSwapiij: |
| 75 | pushl %ebp |
| 76 | movl %esp, %ebp |
| 77 | subl $12, %esp |
| 78 | movl $0, -4(%ebp) |
| 79 | L8: |
| 80 | movl -4(%ebp), %eax |
| 81 | cmpl 16(%ebp), %eax |
| 82 | jae L7 |
| 83 | leal 12(%ebp), %eax |
| 84 | movl %eax, 4(%esp) |
| 85 | leal 8(%ebp), %eax |
| 86 | movl %eax, (%esp) |
| 87 | call __Z11IntTempSwapRiS_ |
| 88 | leal -4(%ebp), %eax |
| 89 | incl (%eax) |
| 90 | jmp L8 |
| 91 | L7: |
| 92 | leave |
| 93 | ret |
| 94 | .def ___main; .scl 2; .type 32; .endef |
| 95 | .align 2 |
| 96 | .globl _main |
| 97 | .def _main; .scl 2; .type 32; .endef |
| 98 | _main: |
| 99 | pushl %ebp |
| 100 | movl %esp, %ebp |
| 101 | subl $40, %esp |
| 102 | andl $-16, %esp |
| 103 | movl $0, %eax |
| 104 | addl $15, %eax |
| 105 | addl $15, %eax |
| 106 | shrl $4, %eax |
| 107 | sall $4, %eax |
| 108 | movl %eax, -16(%ebp) |
| 109 | movl -16(%ebp), %eax |
| 110 | call __alloca |
| 111 | call ___main |
| 112 | movl $-75, -4(%ebp) |
| 113 | movl $14011, -8(%ebp) |
| 114 | movl $500000000, -12(%ebp) |
| 115 | movl $500000000, 8(%esp) |
| 116 | movl -8(%ebp), %eax |
| 117 | movl %eax, 4(%esp) |
| 118 | movl -4(%ebp), %eax |
| 119 | movl %eax, (%esp) |
| 120 | call __Z11TestXorSwapiij |
| 121 | movl $500000000, 8(%esp) |
| 122 | movl -8(%ebp), %eax |
| 123 | movl %eax, 4(%esp) |
| 124 | movl -4(%ebp), %eax |
| 125 | movl %eax, (%esp) |
| 126 | call __Z12TestTempSwapiij |
| 127 | movl $0, %eax |
| 128 | leave |
| 129 | ret |
Optimized :
mingw32-g++ -Wall -O2 -S -o TestIntSwapSpeed_O2_asm.txt TestIntSwapSpeed.cpp
| 1 | .file "TestIntSwapSpeed.cpp" |
| 2 | .text |
| 3 | .align 2 |
| 4 | .p2align 4,,15 |
| 5 | .globl __Z10IntXorSwapRiS_ |
| 6 | .def __Z10IntXorSwapRiS_; .scl 2; .type 32; .endef |
| 7 | __Z10IntXorSwapRiS_: |
| 8 | pushl %ebp |
| 9 | movl %esp, %ebp |
| 10 | movl 12(%ebp), %edx |
| 11 | movl 8(%ebp), %ecx |
| 12 | movl (%edx), %eax |
| 13 | xorl (%ecx), %eax |
| 14 | movl %eax, (%ecx) |
| 15 | xorl (%edx), %eax |
| 16 | movl %eax, (%edx) |
| 17 | xorl %eax, (%ecx) |
| 18 | popl %ebp |
| 19 | ret |
| 20 | .align 2 |
| 21 | .p2align 4,,15 |
| 22 | .globl __Z11IntTempSwapRiS_ |
| 23 | .def __Z11IntTempSwapRiS_; .scl 2; .type 32; .endef |
| 24 | __Z11IntTempSwapRiS_: |
| 25 | pushl %ebp |
| 26 | movl %esp, %ebp |
| 27 | movl 8(%ebp), %edx |
| 28 | pushl %ebx |
| 29 | movl 12(%ebp), %ecx |
| 30 | movl (%edx), %ebx |
| 31 | movl (%ecx), %eax |
| 32 | movl %eax, (%edx) |
| 33 | movl %ebx, (%ecx) |
| 34 | popl %ebx |
| 35 | popl %ebp |
| 36 | ret |
| 37 | .align 2 |
| 38 | .p2align 4,,15 |
| 39 | .globl __Z11TestXorSwapiij |
| 40 | .def __Z11TestXorSwapiij; .scl 2; .type 32; .endef |
| 41 | __Z11TestXorSwapiij: |
| 42 | pushl %ebp |
| 43 | movl %esp, %ebp |
| 44 | subl $20, %esp |
| 45 | movl %edi, -4(%ebp) |
| 46 | movl 16(%ebp), %edi |
| 47 | movl %ebx, -12(%ebp) |
| 48 | xorl %ebx, %ebx |
| 49 | cmpl %edi, %ebx |
| 50 | movl %esi, -8(%ebp) |
| 51 | jae L9 |
| 52 | leal 12(%ebp), %esi |
| 53 | .p2align 4,,15 |
| 54 | L7: |
| 55 | movl %esi, 4(%esp) |
| 56 | leal 8(%ebp), %eax |
| 57 | incl %ebx |
| 58 | movl %eax, (%esp) |
| 59 | call __Z10IntXorSwapRiS_ |
| 60 | cmpl %edi, %ebx |
| 61 | jb L7 |
| 62 | L9: |
| 63 | movl -12(%ebp), %ebx |
| 64 | movl -8(%ebp), %esi |
| 65 | movl -4(%ebp), %edi |
| 66 | movl %ebp, %esp |
| 67 | popl %ebp |
| 68 | ret |
| 69 | .align 2 |
| 70 | .p2align 4,,15 |
| 71 | .globl __Z12TestTempSwapiij |
| 72 | .def __Z12TestTempSwapiij; .scl 2; .type 32; .endef |
| 73 | __Z12TestTempSwapiij: |
| 74 | pushl %ebp |
| 75 | movl %esp, %ebp |
| 76 | subl $20, %esp |
| 77 | movl %edi, -4(%ebp) |
| 78 | movl 16(%ebp), %edi |
| 79 | movl %ebx, -12(%ebp) |
| 80 | xorl %ebx, %ebx |
| 81 | cmpl %edi, %ebx |
| 82 | movl %esi, -8(%ebp) |
| 83 | jae L17 |
| 84 | leal 12(%ebp), %esi |
| 85 | .p2align 4,,15 |
| 86 | L15: |
| 87 | movl %esi, 4(%esp) |
| 88 | leal 8(%ebp), %eax |
| 89 | incl %ebx |
| 90 | movl %eax, (%esp) |
| 91 | call __Z11IntTempSwapRiS_ |
| 92 | cmpl %edi, %ebx |
| 93 | jb L15 |
| 94 | L17: |
| 95 | movl -12(%ebp), %ebx |
| 96 | movl -8(%ebp), %esi |
| 97 | movl -4(%ebp), %edi |
| 98 | movl %ebp, %esp |
| 99 | popl %ebp |
| 100 | ret |
| 101 | .def ___main; .scl 2; .type 32; .endef |
| 102 | .align 2 |
| 103 | .p2align 4,,15 |
| 104 | .globl _main |
| 105 | .def _main; .scl 2; .type 32; .endef |
| 106 | _main: |
| 107 | pushl %ebp |
| 108 | movl $16, %eax |
| 109 | movl %esp, %ebp |
| 110 | subl $24, %esp |
| 111 | andl $-16, %esp |
| 112 | call __alloca |
| 113 | call ___main |
| 114 | movl $-75, (%esp) |
| 115 | movl $500000000, %eax |
| 116 | movl $14011, %ecx |
| 117 | movl %eax, 8(%esp) |
| 118 | movl %ecx, 4(%esp) |
| 119 | call __Z11TestXorSwapiij |
| 120 | movl $-75, (%esp) |
| 121 | movl $14011, %eax |
| 122 | movl $500000000, %edx |
| 123 | movl %eax, 4(%esp) |
| 124 | movl %edx, 8(%esp) |
| 125 | call __Z12TestTempSwapiij |
| 126 | leave |
| 127 | xorl %eax, %eax |
| 128 | ret |
Assembly operation tallies :
Version pushl popl movl xorl subl leave total ----------------------------------------------------------------------------------- Unoptimized XOR swap (__Z10IntXorSwapRiS_) 1 1 16 3 0 0 21 Temp swap (__Z11IntTempSwapRiS_) 1 0 11 0 1 1 14 Optimized XOR swap (__Z10IntXorSwapRiS_) 1 1 6 3 0 0 11 Temp swap (__Z11IntTempSwapRiS_) 2 2 7 0 0 0 11
The xorl operation must be pretty slow if both optimized versions take the same number of operations but the XOR swap takes 9/5 as long to complete.
The xor-swap is also missing an identity check (it assumes it has two variables for storage and won't work if A and B refer to the same variable).
void IntXorSwap(int& A , int& B) { if (&A != &B) { A ^= B; B ^= A; A ^= B; } }
You're right. The XOR method will be even slower then.
Unoptimized :
mingw32-g++ -Wall -pg -o TestIntSwapSpeed.exe TestIntSwapSpeed.cpp Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls s/call s/call name 34.43 4.61 4.61 500000000 0.00 0.00 IntXorSwap(int&, int&) 23.30 7.73 3.12 500000000 0.00 0.00 IntTempSwap(int&, int&) 22.03 10.68 2.95 1 2.95 7.56 TestXorSwap(int, int, unsigned int) 20.24 13.39 2.71 1 2.71 5.83 TestTempSwap(int, int, unsigned int)
Optimized with -O2
mingw32-g++ -Wall -O2 -pg -o TestIntSwapSpeed.exe TestIntSwapSpeed.cpp Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls s/call s/call name 53.09 2.92 2.92 500000000 0.00 0.00 IntXorSwap(int&, int&) 26.18 4.36 1.44 500000000 0.00 0.00 IntTempSwap(int&, int&) 13.45 5.10 0.74 1 0.74 2.18 TestTempSwap(int, int, unsigned int) 7.27 5.50 0.40 1 0.40 3.32 TestXorSwap(int, int, unsigned int)
Execution time ratio for updated XOR method to temp method :
Unoptimized : ~148%
Optimized : ~203%
Assembly output
(unoptimized and optimized versions of _Z10IntXorSwapRiS together)
| 1 | // |
| 2 | // unoptimized |
| 3 | .globl __Z10IntXorSwapRiS_ |
| 4 | .def __Z10IntXorSwapRiS_; .scl 2; .type 32; .endef |
| 5 | __Z10IntXorSwapRiS_: |
| 6 | pushl %ebp |
| 7 | movl %esp, %ebp |
| 8 | movl 8(%ebp), %eax |
| 9 | cmpl 12(%ebp), %eax |
| 10 | je L1 |
| 11 | movl 8(%ebp), %ecx |
| 12 | movl 8(%ebp), %edx |
| 13 | movl 12(%ebp), %eax |
| 14 | movl (%eax), %eax |
| 15 | xorl (%edx), %eax |
| 16 | movl %eax, (%ecx) |
| 17 | movl 12(%ebp), %ecx |
| 18 | movl 12(%ebp), %edx |
| 19 | movl 8(%ebp), %eax |
| 20 | movl (%eax), %eax |
| 21 | xorl (%edx), %eax |
| 22 | movl %eax, (%ecx) |
| 23 | movl 8(%ebp), %ecx |
| 24 | movl 8(%ebp), %edx |
| 25 | movl 12(%ebp), %eax |
| 26 | movl (%eax), %eax |
| 27 | xorl (%edx), %eax |
| 28 | movl %eax, (%ecx) |
| 29 | L1: |
| 30 | popl %ebp |
| 31 | ret |
| 32 | .align 2 |
| 33 | |
| 34 | // optimized |
| 35 | .globl __Z10IntXorSwapRiS_ |
| 36 | .def __Z10IntXorSwapRiS_; .scl 2; .type 32; .endef |
| 37 | __Z10IntXorSwapRiS_: |
| 38 | pushl %ebp |
| 39 | movl %esp, %ebp |
| 40 | movl 8(%ebp), %ecx |
| 41 | movl 12(%ebp), %edx |
| 42 | cmpl %edx, %ecx |
| 43 | je L1 |
| 44 | movl (%edx), %eax |
| 45 | xorl (%ecx), %eax |
| 46 | movl %eax, (%ecx) |
| 47 | xorl (%edx), %eax |
| 48 | movl %eax, (%edx) |
| 49 | xorl %eax, (%ecx) |
| 50 | L1: |
| 51 | popl %ebp |
| 52 | ret |
| 53 | .align 2 |
| 54 | .p2align 4,,15 |
| 55 | |
| 56 | // |
Tallies :
Version pushl popl cmpl movl xorl je total ----------------------------------------------------------------------------------- Unoptimized XOR swap (__Z10IntXorSwapRiS_) 1 1 1 17 3 1 24 Optimized XOR swap (__Z10IntXorSwapRiS_) 1 1 1 6 3 1 13
It's interesting to see that the -O2 version was able to trim off 11 movl operations. Is that the difference between keeping the data in the registers and checking it from memory?
On a different note, I thought XOR variable swapping was so cool when I first saw it, but it performs much more slowly than using temporary variables to swap the values.
Yep, it long ago stopped being a fast trick and is now just something worth mentioning to ensure that everyone's on the same page with respect to the commutative and associative properties of bitwise operators.
You will sometimes still see people using it, but as with a lot of things, it's mostly personal taste and probably marks them out as an older programmer; on something like a z80 where xors cost exactly the same as inter-register loads, three xors superficially costs the same in cycles but saves a valuable register. On modern x86 I think we all know that the 'registers' aren't actually registers but rather merely a shorthand to describe how subsequent values relate to earlier ones and the bottlenecks that actually cost are completely unrelated to how quickly you can swap values for any application that anybody actually cares about.
How about "a^=a" versus "a=0"? Is the former still faster? I can't imagine it is.
as with a lot of things, it's mostly personal taste and probably marks them out as an older programmer
A preference for obfuscated code? Hmm...
On modern x86 I think we all know that the 'registers' aren't actually registers but rather merely a shorthand to describe how subsequent values relate to earlier ones
I didn't know! Do you have a link where this is discussed in a bit more detail?
I didn't know! Do you have a link where this is discussed in a bit more detail?
Ummm, I was really just making oblique reference to the combination of feeding multiple simultaneous pipelines through data dependency calculation and, I guess, a little bit to caching. So if a CPU spots that registers A, B and C are all being used to do one calculation then D, E and F are used to do another that doesn't depend on the results of the A, B, C calculation then it can often do the A, B and C calculation simultaneously to the D, E and F calculation and not have to worry about the two strands being in synchronisation until something tries to combine data from both streams.
Add that to processors like the Pentium Pro on which the first-level cache is operating at the same speed as the CPU and you're really fighting to find a reason that the registers are any more significant than any of the other temporary stores hanging around a chip. Especially where hyperthreading making it less logical to have any sort of primary source of data for calculations.
That's before you start discussing how microcoding often maps the x86 instruction set to a more RISCy internal set for the actual processing unit that tends not to have a 1:1 correlation to the named x86 registers.
I would therefore distinguish modern registers from the registers of, e.g. the z80, because you can no longer say from looking at your assembly exactly what the state of the named registers is after every instruction. Nowadays you just work on being certain that the state would be reported as a certain thing were you to query it. Which is obviously compatible with the old model but completely changes the field of optimisation.
I'm talking generally because I did a general degree way back when and have not subsequently interested myself with specifics.
EDIT:
A preference for obfuscated code? Hmm...
I don't think it's obfuscated for them. It's just part of the way the brain recognises things; like the way it recognises sentence structures and can fix bits sometimes without you being consciously aware (which is annoying when you're an editor) - they're used to being able to instantly spot that a certain pattern does a certain thing to the extent where 'a ^= b; b ^= a; a ^= b;' just reads instantly as 'swap a and b', not 'a exclusive or with b, store result to a; ...'.
I was really just trying to say "you may see this around still, but it's not some sort of magical recipe and don't bother adapting yourself to it now".
I like xor-swap because it's easy to implement as a CPP macro--no passing pointers to a function and no temp variable. The only place I've ever used bitshifts and bitwise things like that is in a very simplistic implementation of ARC4; I'm not doing things like that much anymore.
I was really just making oblique reference to the combination of feeding multiple simultaneous pipelines through data dependency calculation and, I guess, a little bit to caching. So if a CPU spots that registers A, B and C are all being used to do one calculation then D, E and F are used to do another that doesn't depend on the results of the A, B, C calculation then it can often do the A, B and C calculation simultaneously to the D, E and F calculation and not have to worry about the two strands being in synchronisation until something tries to combine data from both streams.
Add that to processors like the Pentium Pro on which the first-level cache is operating at the same speed as the CPU and you're really fighting to find a reason that the registers are any more significant than any of the other temporary stores hanging around a chip. Especially where hyperthreading making it less logical to have any sort of primary source of data for calculations.
That's before you start discussing how microcoding often maps the x86 instruction set to a more RISCy internal set for the actual processing unit that tends not to have a 1:1 correlation to the named x86 registers.
Ah, yes! That makes sense, actually. I'd never thought about this (in terms of assembler language and processor architecture I never really went much beyond the 486 although I know (knew) a few basic things about the AMD64).
I'm talking generally because I did a general degree way back when and have not subsequently interested myself with specifics.
That's fine, this is exactly the right level of explanation that I was looking for. Thanks.
I like xor-swap because it's easy to implement as a CPP macro--no passing pointers to a function and no temp variable.
If you create a function, there are no temp variables (that you see or care about). I personally always use the std functions wherever possible:
int a = 1; int b = 2; std::swap (a, b);
if a CPU spots that registers A, B and C are all being used to do one calculation then D, E and F are used to do another that doesn't depend on the results of the A, B, C calculation then it can often do the A, B and C calculation simultaneously to the D, E and F calculation
Registers themselves have "shadow copies" (up to 128 for eax) so that if your lame compiler uses some particular register to do one set of operations, then uses it immediately afterward to do some other independant set of ops, it can use a shadow register to do the second set of ops before the first set has finished.
How does that std::swap work? It shouldn't be able to swap variables in the calling context without getting pointers to them, unless C++ is much weirder than I thought.
Ever heard of references? 
void change (int &var) { var++; } int main () { int var = 1; change(var); // var is now 2! \o/ }
Actually I haven't. I don't use C++; I only know it to the extent that it's based on C.
Well if you are interested here is a nice entry in C++ FAQ Lite.
How does that std::swap work? It shouldn't be able to swap variables in the calling context without getting pointers to them, unless C++ is much weirder than I thought.
A reference works like a pointer, but acts like an object.
mingw/include/c++/3.4.5/bits/stl_algobase.h lines 114 - 133
| 1 | // |
| 2 | /** |
| 3 | * @brief Swaps two values. |
| 4 | * @param a A thing of arbitrary type. |
| 5 | * @param b Another thing of arbitrary type. |
| 6 | * @return Nothing. |
| 7 | * |
| 8 | * This is the simple classic generic implementation. It will work on |
| 9 | * any type which has a copy constructor and an assignment operator. |
| 10 | */ |
| 11 | template<typename _Tp> |
| 12 | inline void |
| 13 | swap(_Tp& __a, _Tp& __b) |
| 14 | { |
| 15 | // concept requirements |
| 16 | __glibcxx_function_requires(_SGIAssignableConcept<_Tp>) |
| 17 | |
| 18 | const _Tp __tmp = __a; |
| 19 | __a = __b; |
| 20 | __b = __tmp; |
| 21 | } |
| 22 | |
| 23 | // |
That's just one of many swaps though, but it's possible they mostly just use that one in varying ways - grep for swap in your compiler include directory.
One way to use bits in numbers is as a set. Suppose that you have ten items and want to represent subsets of those items. Simply use the first ten bits of an int, where 1 means its included and 0 means it is not included.
To check if a particular item is included
subset_flags & (1 << index)
Flip it using
subset_flag ^= (1 << index)
However, the neatest part is iteration
for(int subset = 0; subset < ( 1 << count_of_items_in_set);subset++)
This will iterate through all possible subsets. But due to nature of order through which it will go through the possiblities, for every set under consideration, all possible subsets will have already undergone consideration.
I.e. for the subset
1101
All of the following are before it:
1100
1001
0101
0001
Topic's been dead for a few days I know, but I'll stick to with the plan anyway
.
There was alot of good reading in this so far. It seems that a few folk learnt something, so the thread is working
.
Cheers to anyone who contributed this week.
I was intrigued by Thomas Harte's idea of putting this stuff into the wiki if the powers that be feel the information inputted here is valuable enough so far?
If so, I'd be happy to type it up in article form myself if no-one else would rather do it (giving credit to the original contributors of course
).
Either way however, the topic changes tomorrow. Has anyone got any suggestion for a new one they feel there's plenty of dicussion potential in?
Maybe you already noticed I tried to make a wiki entry under "bitwise operators".
But now I can't log-in anymore. I attached my second draft version you might use to complete it.(Notice I dubbed it "Thursday's topic".)
The reason last topic went so well however, is that there's little discussion about these things. And there is verifiable information. If I suggested "project(file)s" or "Open source game development", your average discussion chaos would occur, procuring any effort to transpose it into the wiki.
Oh, what also would be nice if you updated (in the future) your first post to have an index.
Ahh, I seen that topic but i never noticed it was related to this topic haha.
I can't check our your attachment from where I am right now, but I'll have a look tonight and see if i can finish it off/ post it on the wiki if you're cool with that
.
About the subject matter, i think you're right. I'd love to hear what people have to say about proper OOP (I just can't get it right!
), but the subject is covered in depth in so many places that i don't think anyone would really feel it's worth it to offer their two cents.
One idea I had been thinking about for discussion this week is function pointers. Any objections or better ideas?
Will you be just carrying on this thread or starting a new one?
Without wanting to say anything of them before Thursday, I can think of at least five things to say about function pointers...
Wonderful. I look forward to hearing them
.
I wasn't so sure about whether to start a new thread or not though.
On one hand it would be clearer, although on the other it may be regarded as 'spammy'.
1 Thread once a week is definitely not spammy. Open up a new thread to avoid confusion and mixing up subjects.
Thy will be done
.