Allegro.cc - Online Community

Allegro.cc Forums » Off-Topic Ordeals » Discussion Thursdays?

This thread is locked; no one can reply to it. rss feed Print
 1   2 
Discussion Thursdays?
Sol Blast
Member #9,655
April 2008
avatar

Good Morning! (From Ireland anyway.)

I've had this idea for a thread for a while, ignore it if you don't think it's a good one.
Basically, with all the valuable information I've picked up about concepts of C/C++ on this forum in assorted threads that I'd of otherwise have had no interest to actively look up, I've been left somewhat blood thirsty for more.
I thought it might be an idea for a thread like this one to come along and choose a topic/ concept of programming, for any member to enter and share their knowledge, experiences, warnings, tips, how it works internally and so on. We discuss that topic in depth to the exclusion of all others for the next week, and then choose another.
Every Thursday the topic will change decided by suggestions from those who've contributed.

Personally I think it'd be a great way for everyone to expand their knowledge and learn about things it never occurred to them to learn about before.

I'll get the ball rolling for this week by suggesting we discuss... Bitwise Operators? I'll leave it to the first proper contributor to choose the definite topic if they have a better idea though.
I myself am too much of a newbie to have anything useful to add of course, but I look forward to reading your replies :).

Thomas Harte
Member #33
April 2000
avatar

Ummm, not much to say really...

All numbers are stored as a binary code, i.e. in base 2.

If you bitwise OR two bits then the result is set iff either of the two input bits were set.

If you bitwise AND two bits then the result is set iff both of the input bits were set.

If you bitwise EXCLUSIVE-OR two bits then the result is set iff exactly one of the input bits was set.

The bitwise NOT of a single bit is that bit inverted.

The effect of carrying out a bitwise operation using integers is that the bitwise operation is carried out on each digit separately.

The C operators are '|' for OR, '&' for AND, '^' for EXCLUSIVE-OR and '~' for NOT. |, & and ^ are binary operators, ~ is a unary operator.

Bitwise operations are often seen in C/C++ programs to represent flags. For example:

1#define FLAG_1 1
2#define FLAG_2 2
3#define FLAG_3 4 /* these are powers of two */
4<etc>
5 
6...
7 
8int FlagField = FLAG_1 | FLAG_2; // FlagField has flags 1 and 2 set
9 
10...
11 
12if(FlagField&FLAG_3)
13{
14 // this code is executed iff flag 3 is set
15}
16 
17if(FlagField&FLAG_2)
18{
19 // this code is executed iff flag 2 is set
20}
21 
22if(FlagField&(FLAG_1|FLAG_2))
23{
24 // this code is executed if either FLAG_1 or FLAG_2 is set
25}

Another common construction is this:

if(NewVar^OldVar&NewVar&FLAG_1)
{
   // executes only if flag 1 is set in NewVar but not in OldVar
}
OldVar = NewVar;

Which is fairly easy to understand. The EXCLUSIVE OR effectively produces a number with a bit set everywhere a bit has changed between OldVar and NewVar, the AND with NewVar keeps only those bits that have changed and are set in NewVar, and the AND with FLAG_1 throws away all those bits that aren't flag 1. Then the usual C rule of 'execute if expression is non-zero' takes the code into the if statement.

Ummm, I guess you also sometimes see:

a ^= b;
b ^= a;
a ^= b;

To do an in-place swap of the values of a and b. That works because binary operators are commutative and associative - b ^ a ^ b = a ^ b ^ b = a ^ (b ^ b) = a ^ 0 = a. And in that case we go from:

  • a = a, b = b; to

  • a = a^b, b = b; to

  • a = a^b, b = b^a^b; to

  • a = a^b^b^a^b, b = b^a^b

So at the end, a = a^b^b^a^b = (a^a) ^ (b^b) ^ b = 0 ^ 0 ^ b = b, and b = b ^ a ^ b = b ^ b ^ a = 0 ^ a = a.

It's rare that you need to do bitwise stuff other than to communicate with hardware at the lowest level, but lots of programmers with lower level experience put a lot of it around because it just matches how they think about things.

Probably lots more to say, but with these things it's all just directly consequential from the definition...

EDIT: P.s. neat idea, especially if someone is going to start transposing this stuff into the wiki?

alethiophile
Member #9,349
December 2007
avatar

The first ten or so times I saw the 'if and only if' construction 'iff', I thought it was an unusually persistent typo. :P

The main time I used bitwise operators was when trying to pack/unpack bytes from a long; I ended up using a union eventually.

--
Do not meddle in the affairs of dragons, for you are crunchy and taste good with ketchup.
C++: An octopus made by nailing extra legs onto a dog.
I am the Lightning-Struck Penguin of Doom.

Evert
Member #794
November 2000
avatar

I use bitwise operators a lot to check/set/clear bit-encoded flags, as TH stated. I tend to use hexadecimal notation rather than decimal when defining flags, mainly because it's more closely linked to the lower level bitwise encoding.

My main source of frustration is trying to use bit flags in FORTRAN. Let's just say it wasn't designed with bitwise operators in mind; it uses functions to provide similar functionality: BTEST, DSHIFTL, DSHIFTR, IAND, IBCHNG, IBCLR, IBITS, IBSET, IEOR, IOR, ISHA, ISHC, ISHFT, ISHFTC, ISHL, IXOR, LSHIFT, NOT, RSHIFT , SHIFTL, SHIFTR.
I often find myself wishing for the simplicity of bitwise operators in FORTRAN... they make the code look so much more compact and easier to follow.

SiegeLord
Member #7,827
October 2006
avatar

I either use octal (I find it easier to read the hexadecimal for these cases) when defining my flags, or bitwise shifts:

1enum Flags1
2{
3 flag1 = 01,
4 flag2 = 02,
5 flag3 = 04,
6 flag4 = 08,
7 flag5 = 010
8};
9 
10enum Flags2
11{
12 flag1 = 1 << 0,
13 flag2 = 1 << 1,
14 flag3 = 1 << 2,
15 flag4 = 1 << 3,
16 flag5 = 1 << 4
17};

"For in much wisdom is much grief: and he that increases knowledge increases sorrow."-Ecclesiastes 1:18
[SiegeLord's Abode][Codes]:[DAllegro5]:[RustAllegro]

bamccaig
Member #7,536
July 2006
avatar

SiegeLord said:

enum Flags1
{
    flag1 = 001,
    flag2 = 002,
    flag3 = 004,
    flag4 = 010,
    flag5 = 020
};

Fixed. ;)

Peter Wang
Member #23
April 2000

What he was thinking in the first place:

enum Flags1
{
    flag1 = 0x1,
    flag2 = 0x2,
    flag3 = 0x4,
    flag4 = 0x8,
    flag5 = 0x10
};

SiegeLord
Member #7,827
October 2006
avatar

nvm, need more sleep.

"For in much wisdom is much grief: and he that increases knowledge increases sorrow."-Ecclesiastes 1:18
[SiegeLord's Abode][Codes]:[DAllegro5]:[RustAllegro]

Todd Cope
Member #998
November 2000
avatar

Quote:

The bitwise NOT of a single bit is that bit inverted.

How do you perform this in C? Can I just do:

flags ~= MYFLAG;

I need to toggle bits from time to time and have been using flags ^= MYFLAG; // edited, I put ~ by mistake but I don't really know if that is the correct way to do it.

Evert
Member #794
November 2000
avatar

Quote:

Can I just do:

flags ~= MYFLAG;

I need to toggle bits from time to time and have been using

flags ~= MYFLAG;

but I don't really know if that is the correct way to do it.

Only if you know MYFLAG is set.
The proper way would be flags &= ~MYFLAG.

Todd Cope
Member #998
November 2000
avatar

Quote:

The proper way would be flags &= ~MYFLAG.

Will that toggle MYFLAG on if it is not already on? Seems to me like it would only turn it off if it is on.

Edgar Reynaldo
Major Reynaldo
May 2007
avatar

To flip specific bits of a number , Bitwise XOR the number with a number that has only the bits you want to flip set to 1.

//
const int X = 1;
const int Y = 2;
const int Z = 4;

int a = X | Z;
int b = a ^ Y;
int c = X | Y | Z;
if (c == b) {XOR_1_Flips_Bits();}
//

1 XOR 0 is 1 because they are different
0 XOR 0 is 0 because they are the same
So if you XOR a bit with 0 it remains the same.

0 XOR 1 is 1 because they are different
1 XOR 1 is 0 because they are the same
So if you XOR a bit with 1 it is flipped.

Evert
Member #794
November 2000
avatar

Sorry, I got confused. To turn bits off (which I though was what you wanted to know), do what I said. To toggle them, logical xor is indeed the correct way to do it.

Edgar Reynaldo
Major Reynaldo
May 2007
avatar

I made a mistake in the code I posted - I used binary NOT (~) instead of XOR (^), and I compared a to c when I meant to compare b to c. It is corrected now.

On a different note, I thought XOR variable swapping was so cool when I first saw it, but it performs much more slowly than using temporary variables to swap the values.

TestIntSwapSpeed.cpp

1//
2 
3void IntXorSwap(int& A , int& B) {
4 A ^= B;// A now holds the difference between the original A and B
5 B ^= A;// B now holds the original value of A since B = (B^A which equals A^B which equals {originalA ^ B}^B )
6 A ^= B;// A now holds the original value of B since A = (A^B which equals B^A) and since A held the original difference between A and B
7}
8void IntTempSwap(int& A , int& B) {
9 int temp = A;
10 A = B;
11 B = temp;
12}
13 
14void TestXorSwap(int a , int b , unsigned int ntests) {
15 for (unsigned int i = 0 ; i < ntests ; ++i) {
16 IntXorSwap(a,b);
17 }
18}
19 
20void TestTempSwap(int a , int b , unsigned int ntests) {
21 for (unsigned int i = 0 ; i < ntests ; ++i) {
22 IntTempSwap(a,b);
23 }
24}
25 
26int main() {
27
28 int inta = -75 , intb = 14011;
29 const unsigned int num_tests = 500000000;//500 million
30
31 TestXorSwap(inta , intb , num_tests);
32 TestTempSwap(inta , intb , num_tests);
33
34 return 0;
35}
36 
37//

Profiling results :

Unoptimized

mingw32-g++ -Wall -pg -o TestIntSwapSpeed.exe TestIntSwapSpeed.cpp

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls   s/call   s/call  name    
 31.05      3.94     3.94 500000000     0.00     0.00  IntXorSwap(int&, int&)
 24.27      7.02     3.08 500000000     0.00     0.00  IntTempSwap(int&, int&)
 22.62      9.89     2.87        1     2.87     5.95  TestTempSwap(int, int, unsigned int)
 22.06     12.69     2.80        1     2.80     6.74  TestXorSwap(int, int, unsigned int)

Optimized with -O2

mingw32-g++ -Wall -pg -O2 -o TestIntSwapSpeed.exe TestIntSwapSpeed.cpp

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls   s/call   s/call  name    
 49.13      3.10     3.10 500000000     0.00     0.00  IntXorSwap(int&, int&)
 27.42      4.83     1.73 500000000     0.00     0.00  IntTempSwap(int&, int&)
 11.89      5.58     0.75        1     0.75     2.48  TestTempSwap(int, int, unsigned int)
 11.57      6.31     0.73        1     0.73     3.83  TestXorSwap(int, int, unsigned int)

So unoptimized, 500,000,000 swaps using XOR took 3.94 seconds and using a temp variable took 3.08 seconds.
Optimized, 500,000,000 swaps using XOR took 3.10 seconds and using a temp variable took 1.73 seconds.

Percentage wise, unoptimized XOR swapping takes ~128% as much time as temp swapping, and optimized XOR swapping takes 180% as much time as temp swapping for integers.

Here's the assembly code MinGW produces when there's no profiling :

Unoptimized :

mingw32-g++ -Wall -S -o TestIntSwapSpeed_asm.txt TestIntSwapSpeed.cpp

1 .file "TestIntSwapSpeed.cpp"
2 .text
3 .align 2
4.globl __Z10IntXorSwapRiS_
5 .def __Z10IntXorSwapRiS_; .scl 2; .type 32; .endef
6__Z10IntXorSwapRiS_:
7 pushl %ebp
8 movl %esp, %ebp
9 movl 8(%ebp), %ecx
10 movl 8(%ebp), %edx
11 movl 12(%ebp), %eax
12 movl (%eax), %eax
13 xorl (%edx), %eax
14 movl %eax, (%ecx)
15 movl 12(%ebp), %ecx
16 movl 12(%ebp), %edx
17 movl 8(%ebp), %eax
18 movl (%eax), %eax
19 xorl (%edx), %eax
20 movl %eax, (%ecx)
21 movl 8(%ebp), %ecx
22 movl 8(%ebp), %edx
23 movl 12(%ebp), %eax
24 movl (%eax), %eax
25 xorl (%edx), %eax
26 movl %eax, (%ecx)
27 popl %ebp
28 ret
29 .align 2
30.globl __Z11IntTempSwapRiS_
31 .def __Z11IntTempSwapRiS_; .scl 2; .type 32; .endef
32__Z11IntTempSwapRiS_:
33 pushl %ebp
34 movl %esp, %ebp
35 subl $4, %esp
36 movl 8(%ebp), %eax
37 movl (%eax), %eax
38 movl %eax, -4(%ebp)
39 movl 8(%ebp), %edx
40 movl 12(%ebp), %eax
41 movl (%eax), %eax
42 movl %eax, (%edx)
43 movl 12(%ebp), %edx
44 movl -4(%ebp), %eax
45 movl %eax, (%edx)
46 leave
47 ret
48 .align 2
49.globl __Z11TestXorSwapiij
50 .def __Z11TestXorSwapiij; .scl 2; .type 32; .endef
51__Z11TestXorSwapiij:
52 pushl %ebp
53 movl %esp, %ebp
54 subl $12, %esp
55 movl $0, -4(%ebp)
56L4:
57 movl -4(%ebp), %eax
58 cmpl 16(%ebp), %eax
59 jae L3
60 leal 12(%ebp), %eax
61 movl %eax, 4(%esp)
62 leal 8(%ebp), %eax
63 movl %eax, (%esp)
64 call __Z10IntXorSwapRiS_
65 leal -4(%ebp), %eax
66 incl (%eax)
67 jmp L4
68L3:
69 leave
70 ret
71 .align 2
72.globl __Z12TestTempSwapiij
73 .def __Z12TestTempSwapiij; .scl 2; .type 32; .endef
74__Z12TestTempSwapiij:
75 pushl %ebp
76 movl %esp, %ebp
77 subl $12, %esp
78 movl $0, -4(%ebp)
79L8:
80 movl -4(%ebp), %eax
81 cmpl 16(%ebp), %eax
82 jae L7
83 leal 12(%ebp), %eax
84 movl %eax, 4(%esp)
85 leal 8(%ebp), %eax
86 movl %eax, (%esp)
87 call __Z11IntTempSwapRiS_
88 leal -4(%ebp), %eax
89 incl (%eax)
90 jmp L8
91L7:
92 leave
93 ret
94 .def ___main; .scl 2; .type 32; .endef
95 .align 2
96.globl _main
97 .def _main; .scl 2; .type 32; .endef
98_main:
99 pushl %ebp
100 movl %esp, %ebp
101 subl $40, %esp
102 andl $-16, %esp
103 movl $0, %eax
104 addl $15, %eax
105 addl $15, %eax
106 shrl $4, %eax
107 sall $4, %eax
108 movl %eax, -16(%ebp)
109 movl -16(%ebp), %eax
110 call __alloca
111 call ___main
112 movl $-75, -4(%ebp)
113 movl $14011, -8(%ebp)
114 movl $500000000, -12(%ebp)
115 movl $500000000, 8(%esp)
116 movl -8(%ebp), %eax
117 movl %eax, 4(%esp)
118 movl -4(%ebp), %eax
119 movl %eax, (%esp)
120 call __Z11TestXorSwapiij
121 movl $500000000, 8(%esp)
122 movl -8(%ebp), %eax
123 movl %eax, 4(%esp)
124 movl -4(%ebp), %eax
125 movl %eax, (%esp)
126 call __Z12TestTempSwapiij
127 movl $0, %eax
128 leave
129 ret

Optimized :

mingw32-g++ -Wall -O2 -S -o TestIntSwapSpeed_O2_asm.txt TestIntSwapSpeed.cpp

1 .file "TestIntSwapSpeed.cpp"
2 .text
3 .align 2
4 .p2align 4,,15
5.globl __Z10IntXorSwapRiS_
6 .def __Z10IntXorSwapRiS_; .scl 2; .type 32; .endef
7__Z10IntXorSwapRiS_:
8 pushl %ebp
9 movl %esp, %ebp
10 movl 12(%ebp), %edx
11 movl 8(%ebp), %ecx
12 movl (%edx), %eax
13 xorl (%ecx), %eax
14 movl %eax, (%ecx)
15 xorl (%edx), %eax
16 movl %eax, (%edx)
17 xorl %eax, (%ecx)
18 popl %ebp
19 ret
20 .align 2
21 .p2align 4,,15
22.globl __Z11IntTempSwapRiS_
23 .def __Z11IntTempSwapRiS_; .scl 2; .type 32; .endef
24__Z11IntTempSwapRiS_:
25 pushl %ebp
26 movl %esp, %ebp
27 movl 8(%ebp), %edx
28 pushl %ebx
29 movl 12(%ebp), %ecx
30 movl (%edx), %ebx
31 movl (%ecx), %eax
32 movl %eax, (%edx)
33 movl %ebx, (%ecx)
34 popl %ebx
35 popl %ebp
36 ret
37 .align 2
38 .p2align 4,,15
39.globl __Z11TestXorSwapiij
40 .def __Z11TestXorSwapiij; .scl 2; .type 32; .endef
41__Z11TestXorSwapiij:
42 pushl %ebp
43 movl %esp, %ebp
44 subl $20, %esp
45 movl %edi, -4(%ebp)
46 movl 16(%ebp), %edi
47 movl %ebx, -12(%ebp)
48 xorl %ebx, %ebx
49 cmpl %edi, %ebx
50 movl %esi, -8(%ebp)
51 jae L9
52 leal 12(%ebp), %esi
53 .p2align 4,,15
54L7:
55 movl %esi, 4(%esp)
56 leal 8(%ebp), %eax
57 incl %ebx
58 movl %eax, (%esp)
59 call __Z10IntXorSwapRiS_
60 cmpl %edi, %ebx
61 jb L7
62L9:
63 movl -12(%ebp), %ebx
64 movl -8(%ebp), %esi
65 movl -4(%ebp), %edi
66 movl %ebp, %esp
67 popl %ebp
68 ret
69 .align 2
70 .p2align 4,,15
71.globl __Z12TestTempSwapiij
72 .def __Z12TestTempSwapiij; .scl 2; .type 32; .endef
73__Z12TestTempSwapiij:
74 pushl %ebp
75 movl %esp, %ebp
76 subl $20, %esp
77 movl %edi, -4(%ebp)
78 movl 16(%ebp), %edi
79 movl %ebx, -12(%ebp)
80 xorl %ebx, %ebx
81 cmpl %edi, %ebx
82 movl %esi, -8(%ebp)
83 jae L17
84 leal 12(%ebp), %esi
85 .p2align 4,,15
86L15:
87 movl %esi, 4(%esp)
88 leal 8(%ebp), %eax
89 incl %ebx
90 movl %eax, (%esp)
91 call __Z11IntTempSwapRiS_
92 cmpl %edi, %ebx
93 jb L15
94L17:
95 movl -12(%ebp), %ebx
96 movl -8(%ebp), %esi
97 movl -4(%ebp), %edi
98 movl %ebp, %esp
99 popl %ebp
100 ret
101 .def ___main; .scl 2; .type 32; .endef
102 .align 2
103 .p2align 4,,15
104.globl _main
105 .def _main; .scl 2; .type 32; .endef
106_main:
107 pushl %ebp
108 movl $16, %eax
109 movl %esp, %ebp
110 subl $24, %esp
111 andl $-16, %esp
112 call __alloca
113 call ___main
114 movl $-75, (%esp)
115 movl $500000000, %eax
116 movl $14011, %ecx
117 movl %eax, 8(%esp)
118 movl %ecx, 4(%esp)
119 call __Z11TestXorSwapiij
120 movl $-75, (%esp)
121 movl $14011, %eax
122 movl $500000000, %edx
123 movl %eax, 4(%esp)
124 movl %edx, 8(%esp)
125 call __Z12TestTempSwapiij
126 leave
127 xorl %eax, %eax
128 ret

Assembly operation tallies :

Version                           pushl  popl  movl  xorl  subl leave   total
-----------------------------------------------------------------------------------
Unoptimized
XOR swap  (__Z10IntXorSwapRiS_)       1     1    16     3     0     0      21
Temp swap (__Z11IntTempSwapRiS_)      1     0    11     0     1     1      14     

Optimized
XOR swap  (__Z10IntXorSwapRiS_)       1     1     6     3     0     0      11
Temp swap (__Z11IntTempSwapRiS_)      2     2     7     0     0     0      11

The xorl operation must be pretty slow if both optimized versions take the same number of operations but the XOR swap takes 9/5 as long to complete.

anonymous
Member #8025
November 2006

The xor-swap is also missing an identity check (it assumes it has two variables for storage and won't work if A and B refer to the same variable).

void IntXorSwap(int& A , int& B) 
{
    if (&A != &B) {
        A ^= B;
        B ^= A;
        A ^= B;
    }
}

Edgar Reynaldo
Major Reynaldo
May 2007
avatar

You're right. The XOR method will be even slower then.

Unoptimized :

mingw32-g++ -Wall -pg -o TestIntSwapSpeed.exe TestIntSwapSpeed.cpp

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls   s/call   s/call  name    
 34.43      4.61     4.61 500000000     0.00     0.00  IntXorSwap(int&, int&)
 23.30      7.73     3.12 500000000     0.00     0.00  IntTempSwap(int&, int&)
 22.03     10.68     2.95        1     2.95     7.56  TestXorSwap(int, int, unsigned int)
 20.24     13.39     2.71        1     2.71     5.83  TestTempSwap(int, int, unsigned int)

Optimized with -O2

mingw32-g++ -Wall -O2 -pg -o TestIntSwapSpeed.exe TestIntSwapSpeed.cpp

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls   s/call   s/call  name    
 53.09      2.92     2.92 500000000     0.00     0.00  IntXorSwap(int&, int&)
 26.18      4.36     1.44 500000000     0.00     0.00  IntTempSwap(int&, int&)
 13.45      5.10     0.74        1     0.74     2.18  TestTempSwap(int, int, unsigned int)
  7.27      5.50     0.40        1     0.40     3.32  TestXorSwap(int, int, unsigned int)

Execution time ratio for updated XOR method to temp method :
Unoptimized : ~148%
Optimized : ~203%

Assembly output
(unoptimized and optimized versions of _Z10IntXorSwapRiS together)

1//
2// unoptimized
3.globl __Z10IntXorSwapRiS_
4 .def __Z10IntXorSwapRiS_; .scl 2; .type 32; .endef
5__Z10IntXorSwapRiS_:
6 pushl %ebp
7 movl %esp, %ebp
8 movl 8(%ebp), %eax
9 cmpl 12(%ebp), %eax
10 je L1
11 movl 8(%ebp), %ecx
12 movl 8(%ebp), %edx
13 movl 12(%ebp), %eax
14 movl (%eax), %eax
15 xorl (%edx), %eax
16 movl %eax, (%ecx)
17 movl 12(%ebp), %ecx
18 movl 12(%ebp), %edx
19 movl 8(%ebp), %eax
20 movl (%eax), %eax
21 xorl (%edx), %eax
22 movl %eax, (%ecx)
23 movl 8(%ebp), %ecx
24 movl 8(%ebp), %edx
25 movl 12(%ebp), %eax
26 movl (%eax), %eax
27 xorl (%edx), %eax
28 movl %eax, (%ecx)
29L1:
30 popl %ebp
31 ret
32 .align 2
33 
34// optimized
35.globl __Z10IntXorSwapRiS_
36 .def __Z10IntXorSwapRiS_; .scl 2; .type 32; .endef
37__Z10IntXorSwapRiS_:
38 pushl %ebp
39 movl %esp, %ebp
40 movl 8(%ebp), %ecx
41 movl 12(%ebp), %edx
42 cmpl %edx, %ecx
43 je L1
44 movl (%edx), %eax
45 xorl (%ecx), %eax
46 movl %eax, (%ecx)
47 xorl (%edx), %eax
48 movl %eax, (%edx)
49 xorl %eax, (%ecx)
50L1:
51 popl %ebp
52 ret
53 .align 2
54 .p2align 4,,15
55 
56//

Tallies :

Version                           pushl  popl  cmpl  movl  xorl    je   total
-----------------------------------------------------------------------------------
Unoptimized
XOR swap  (__Z10IntXorSwapRiS_)       1     1     1    17     3     1      24
Optimized
XOR swap  (__Z10IntXorSwapRiS_)       1     1     1     6     3     1      13

It's interesting to see that the -O2 version was able to trim off 11 movl operations. Is that the difference between keeping the data in the registers and checking it from memory?

Thomas Harte
Member #33
April 2000
avatar

Quote:

On a different note, I thought XOR variable swapping was so cool when I first saw it, but it performs much more slowly than using temporary variables to swap the values.

Yep, it long ago stopped being a fast trick and is now just something worth mentioning to ensure that everyone's on the same page with respect to the commutative and associative properties of bitwise operators.

You will sometimes still see people using it, but as with a lot of things, it's mostly personal taste and probably marks them out as an older programmer; on something like a z80 where xors cost exactly the same as inter-register loads, three xors superficially costs the same in cycles but saves a valuable register. On modern x86 I think we all know that the 'registers' aren't actually registers but rather merely a shorthand to describe how subsequent values relate to earlier ones and the bottlenecks that actually cost are completely unrelated to how quickly you can swap values for any application that anybody actually cares about.

Evert
Member #794
November 2000
avatar

How about "a^=a" versus "a=0"? Is the former still faster? I can't imagine it is.

Quote:

as with a lot of things, it's mostly personal taste and probably marks them out as an older programmer

A preference for obfuscated code? Hmm...

Quote:

On modern x86 I think we all know that the 'registers' aren't actually registers but rather merely a shorthand to describe how subsequent values relate to earlier ones

I didn't know! Do you have a link where this is discussed in a bit more detail?

Thomas Harte
Member #33
April 2000
avatar

Quote:

I didn't know! Do you have a link where this is discussed in a bit more detail?

Ummm, I was really just making oblique reference to the combination of feeding multiple simultaneous pipelines through data dependency calculation and, I guess, a little bit to caching. So if a CPU spots that registers A, B and C are all being used to do one calculation then D, E and F are used to do another that doesn't depend on the results of the A, B, C calculation then it can often do the A, B and C calculation simultaneously to the D, E and F calculation and not have to worry about the two strands being in synchronisation until something tries to combine data from both streams.

Add that to processors like the Pentium Pro on which the first-level cache is operating at the same speed as the CPU and you're really fighting to find a reason that the registers are any more significant than any of the other temporary stores hanging around a chip. Especially where hyperthreading making it less logical to have any sort of primary source of data for calculations.

That's before you start discussing how microcoding often maps the x86 instruction set to a more RISCy internal set for the actual processing unit that tends not to have a 1:1 correlation to the named x86 registers.

I would therefore distinguish modern registers from the registers of, e.g. the z80, because you can no longer say from looking at your assembly exactly what the state of the named registers is after every instruction. Nowadays you just work on being certain that the state would be reported as a certain thing were you to query it. Which is obviously compatible with the old model but completely changes the field of optimisation.

I'm talking generally because I did a general degree way back when and have not subsequently interested myself with specifics.

EDIT:

Quote:

A preference for obfuscated code? Hmm...

I don't think it's obfuscated for them. It's just part of the way the brain recognises things; like the way it recognises sentence structures and can fix bits sometimes without you being consciously aware (which is annoying when you're an editor) - they're used to being able to instantly spot that a certain pattern does a certain thing to the extent where 'a ^= b; b ^= a; a ^= b;' just reads instantly as 'swap a and b', not 'a exclusive or with b, store result to a; ...'.

I was really just trying to say "you may see this around still, but it's not some sort of magical recipe and don't bother adapting yourself to it now".

alethiophile
Member #9,349
December 2007
avatar

I like xor-swap because it's easy to implement as a CPP macro--no passing pointers to a function and no temp variable. The only place I've ever used bitshifts and bitwise things like that is in a very simplistic implementation of ARC4; I'm not doing things like that much anymore.

--
Do not meddle in the affairs of dragons, for you are crunchy and taste good with ketchup.
C++: An octopus made by nailing extra legs onto a dog.
I am the Lightning-Struck Penguin of Doom.

Evert
Member #794
November 2000
avatar

Quote:

I was really just making oblique reference to the combination of feeding multiple simultaneous pipelines through data dependency calculation and, I guess, a little bit to caching. So if a CPU spots that registers A, B and C are all being used to do one calculation then D, E and F are used to do another that doesn't depend on the results of the A, B, C calculation then it can often do the A, B and C calculation simultaneously to the D, E and F calculation and not have to worry about the two strands being in synchronisation until something tries to combine data from both streams.

Add that to processors like the Pentium Pro on which the first-level cache is operating at the same speed as the CPU and you're really fighting to find a reason that the registers are any more significant than any of the other temporary stores hanging around a chip. Especially where hyperthreading making it less logical to have any sort of primary source of data for calculations.

That's before you start discussing how microcoding often maps the x86 instruction set to a more RISCy internal set for the actual processing unit that tends not to have a 1:1 correlation to the named x86 registers.

Ah, yes! That makes sense, actually. I'd never thought about this (in terms of assembler language and processor architecture I never really went much beyond the 486 although I know (knew) a few basic things about the AMD64).

Quote:

I'm talking generally because I did a general degree way back when and have not subsequently interested myself with specifics.

That's fine, this is exactly the right level of explanation that I was looking for. Thanks. :)

Vanneto
Member #8,643
May 2007

Quote:

I like xor-swap because it's easy to implement as a CPP macro--no passing pointers to a function and no temp variable.

If you create a function, there are no temp variables (that you see or care about). I personally always use the std functions wherever possible:

int a = 1;
int b = 2;

std::swap (a, b);

In capitalist America bank robs you.

Roy Underthump
Member #10,398
November 2008
avatar

Quote:

if a CPU spots that registers A, B and C are all being used to do one calculation then D, E and F are used to do another that doesn't depend on the results of the A, B, C calculation then it can often do the A, B and C calculation simultaneously to the D, E and F calculation

Registers themselves have "shadow copies" (up to 128 for eax) so that if your lame compiler uses some particular register to do one set of operations, then uses it immediately afterward to do some other independant set of ops, it can use a shadow register to do the second set of ops before the first set has finished.

I love America -- I love the rights we used to have
George Carlin

alethiophile
Member #9,349
December 2007
avatar

How does that std::swap work? It shouldn't be able to swap variables in the calling context without getting pointers to them, unless C++ is much weirder than I thought.

--
Do not meddle in the affairs of dragons, for you are crunchy and taste good with ketchup.
C++: An octopus made by nailing extra legs onto a dog.
I am the Lightning-Struck Penguin of Doom.

Vanneto
Member #8,643
May 2007

Ever heard of references? :P

void change (int &var)
{
    var++;
}

int main ()
{
   int var = 1;
   change(var);

   // var is now 2! \o/
}

In capitalist America bank robs you.

 1   2 


Go to: