|
|
This thread is locked; no one can reply to it.
|
1
2
|
| Preallocating arrays: specified size or next-power-of-two |
|
Arthur Kalliokoski
Second in Command
February 2005
|
http://en.wikipedia.org/wiki/Data_structure_alignment#Typical_alignment_of_C_structs_on_x86 [EDIT] The memory management stuff in the computer accesses some certain size all the time, Access a char (byte) Nxxx xxxx xNxx xxxx //All fit within a 32 bit access xxNx xxxx xxxN xxxx xxxx Nxxx... Access a 16 bit int NNxx xxxx xNNx xxxx xxxN Nxxx <-crosses 32 bit boundary xxxx NNxx... same for 32 bit var NNNN xxxx xNNN Nxxx <-crosses . . . xxxN NNNx xxxx NNNN <-aligned again
They all watch too much MSNBC... they get ideas. |
|
Thomas Fjellstrom
Member #476
June 2000
|
Arthur Kalliokoski said: Please explain what an unaligned char is, please. Any access that doesn't start at an aligned address, just because the amount of data is smaller doesn't make it ok (its just not as bad). Oscar Giner said: Thomas: what you don't want is having a variable spawning over several "lines" of memory, because that'd mean that for reading 1 variable you'd need 2 memory accesses. I'm not disagreeing. Just that a single char read, not at an aligned boundary isn't as fast as one that is aligned. Some CPUs can't even access data thats unaligned without doing a larger read that is aligned, and then throwing away the bits it doesn't want. Just because the IA32 and x86_64 arch can do unaligned accesses, doesn't mean its as efficient as aligned accesses. -- |
|
Oscar Giner
Member #2,207
April 2002
|
Thomas Fjellstrom said: I'm not disagreeing. Just that a single char read, not at an aligned boundary isn't as fast as one that is aligned. Care to post your source? Because that's not true. Memory must be accessed word by word (that's called a line), it's not possible to read just one byte (well, unless you're working with an 8 bit cpu where a word is 1 byte). It doesn't matter where in the word the byte you want is, since the cpu must retrieve the entire word no matter what. I think you haven't even cared to read that wikipedia article because it explains how data is aligned pretty cleanly :/ -- |
|
Arthur Kalliokoski
Second in Command
February 2005
|
Maybe there's a situation where reading a char into AL register is faster if it's on the low byte (little endian)? They all watch too much MSNBC... they get ideas. |
|
Thomas Fjellstrom
Member #476
June 2000
|
Oscar Giner said: Care to post your source? Just something I've learned reading about how CPUs work. Quote: Memory must be accessed word by word (that's called a line) Uh, a "line" is generally larger, and depends on the CPU architecture, at least if you're talking about the cache line. My Phenom II 810 has a L3 line size of 64 bytes (not bits, 64 bits is the word size). Quote: I think you haven't even cared to read that wikipedia article because it explains how data is aligned pretty cleanly :/ Yeah, I didn't read it. I've skimmed it. But I can see how it can help to group and pack individual chars, so it fetches all 4 at the same time, saving some operations. Of course thats only if you group all your chars in the struct into 4-8 in a row groupings. Space them out a bunch, and its pretty inefficient (on x86/x86-64). And as I mentioned, some cpu's CAN'T access unaligned data at all. All you get is an exception, so if you attempt to access one of those packed chars, you have to fetch an entire word manually, and throw away the bits you don't want, which is what x86 does for you anyhow. -- |
|
OnlineCop
Member #7,919
October 2006
|
Not to rerail this topic or anything, but what should I do for this code? struct s_bound short left,right,top,bottom,btile; // 5 shorts, no change }; ... struct s_bound { short left,right,top,bottom,btile; short unused; // 1 short to give 6-shorts (or 3 ints) alignment }; ... struct s_bound { short left,right,top,bottom,btile; short unused[3]; // 3 shorts to give 8-shorts (or 4 ints) alignment }; I don't know whether this code will be run on 32- or 64-bit architecture. Or both. So would it be better to waste space and try to align the data on a 64-bit model, or pad it for a 32-bit model? Or write a preprocessor macro which would fill in the appropriate number depending? #ifdef 64_bit_architecture # define PADDING(bytes_used) char unused[8-(bytes_used)] #else # define PADDING(bytes_used) char unused[4-(bytes_used)] #endif struct s_bound { short left,right,top,bottom,btile; PADDING(6); // 6 bytes, or 3 shorts, to give a total of 8 shorts (4 ints) };
|
|
Arthur Kalliokoski
Second in Command
February 2005
|
I'd say all of those would be packed tightly as they are, since they're not a hodgepodge of sizes. They all watch too much MSNBC... they get ideas. |
|
Indeterminatus
Member #737
November 2000
|
OnlineCop said: Not to rerail this topic or anything, but what should I do for this code? I wouldn't worry about (manual) padding at all. This makes sense only in very tight environments, and there you know the target architecture exactly. Chances are high you get it right for one setting and make it worse for another, so just let the tool-chain do its job and let it decide that for you. _______________________________ |
|
Oscar Giner
Member #2,207
April 2002
|
Thomas: when did I mentioned anything about cache? I think what you got wrong is the definition of data alignment. For starters, by definition, a byte is always aligned; there's no such thing as a misaligned byte. If you don't want to trust me or that wikipedia article, I'll just copy here a paragrahp of the IA32 Intel Architecture Software Developer Manual book: (note: it's using the intel asm terminology for naming data types, so a word means 2 bytes, doubleword 4 bytes and quadword 8 bytes) Quote: Words, doublewords, and quadwords do not need to be aligned in memory on natural boundaries. (The natural boundaries for words, double words, and quadwords are even-numbered addresses, addresses evenly divisible by four, and addresses evenly divisible by eight, respectively.) However, to improve performance of programs, data structures (especially stacks) should be aligned on natural boundaries whenever possible. See how it doesn't mention bytes anywhere, and words (shorts in our discussion) should just be aligned to even boundaries (not necessarily to addresses divisible by 4 as you say) PD: you should first document yourself properly. I hate when someone discusses about something he doesn't really know and doesn't even seem interested at all about learning the truth. I've studied this at university, and I've worked with a Sparc CPU (yes one of those that don't allow misaligned memory accesses). -- |
|
Thomas Fjellstrom
Member #476
June 2000
|
Oscar Giner said: See how it doesn't mention bytes anywhere, and words (shorts in our discussion) should just be aligned to even boundaries (not necessarily to addresses divisible by 4 as you say) Again, as I mentioned, it works fine. But it doesn't mean its as efficient as aligning to a native word (ie: 4 or 8 bytes). Also as I mentioned, some CPUs just don't support accessing data not aligned to a native word size. Quote: when did I mentioned anything about cache? Its the only "line size" I know about in regards to CPUs. -- |
|
Arthur Kalliokoski
Second in Command
February 2005
|
http://en.wikipedia.org/wiki/Conventional_PCI#Conventional_hardware_specifications http://en.wikipedia.org/wiki/PCI_Express I've read elsewhere that past Pentium I (?) cache line size is always 64 bytes. It doesn't "match" the motherboard bus size, but so what? Neither does HDD sector or cluster size. OTOH, my mobo is 64 bit, which implies at least 64 bit bus (I hope), but I was under the impression it's not PCI express (not mentioned in manual) and there's an AGP video slot. They all watch too much MSNBC... they get ideas. |
|
Oscar Giner
Member #2,207
April 2002
|
Why would it make it less efficient? Maybe you could provide some source? Or you're just guessing? That text clearly states that, for example in case of shorts, they only need to be aligned to even addresses, and you seem to skip that and still say they should be aligned to a word size (address divisible by 4). Anyway I looked at the Optimization Reference Manual, and found a listing for best alignaments, and in this one yes it explictly states that bytes can be at any address without performance penalties. Thomas Fjellstrom said: some CPUs just don't support accessing data not aligned to a native word size. Tell me an example of this. I don't know of any CPU that has such alignment restrictions. The restrictions some CPU's have is that data must be aligned to it's size: shorts must be aligned to even addresses, longs to addresses multiple of 4, doubles or long longs to addresses multiple of 8 and so on. From Sun Sparc documentation (64 bit CPU, so a word is 64 bits): Quote: All quantities must be aligned on their natural boundaries, using standard C data types: short integers are aligned on 16-bit boundaries. int integers are aligned on 32-bit boundaries. long integers are aligned on 64-bit boundaries for SPARC systems. For information on data models, see Appendix C, Making a Device Driver 64-Bit Ready. long long integers are aligned on 64-bit boundaries. As you can see shorts and ints are not required to be aligned to 64 bits, just 16 and 32 respectively. Quote: For example, a structure containing only characters has no alignment restrictions This last quote is to clearly show that bytes don't have any alignment restriction. And I'm finished here if you're not able to provide sources for your claims, since this is getting very stupid. -- |
|
Thomas Fjellstrom
Member #476
June 2000
|
You are right it is getting pretty stupid. I can't find any clear references, mind you I'm not trying very hard. But I still believe that even accessing a single byte that isn't at an aligned location will be slower, at least on some cpus. For a long time intel at least has been implementing some older operations solely in microcode stored in some hidden sram on the die. Though I'm not convinced opcodes that would take byte arguments would be a very wise candidate (not that thats stopped intel and other cpu companies from doing un-wise things) for microcode. I am wondering weather or not you're taking this a little too personally though. It is possible I'm mixing up the full size variable alignment restrictions. I do know that many ARM architectures can not access an unaligned 16 or 32 bit variable. They just throw exceptions. And I may have just extended that to any unaligned data. If they can access unaligned bytes, it wouldn't take much more to extend that to unaligned shorts imo. -- |
|
Arthur Kalliokoski
Second in Command
February 2005
|
Thomas Fjellstrom said: I do know that many ARM architectures can not access an unaligned 16 or 32 bit variable. They just throw exceptions. Even a 486 will do that if you set the correct (protected) bit in the flags register. They all watch too much MSNBC... they get ideas. |
|
Oscar Giner
Member #2,207
April 2002
|
This is the best I could find: http://webster.cs.ucr.edu/AoA/Windows/HTML/SystemOrganizationa2.html Skip to the middle of the page, when it starts talking about 16 bit data buses, later it also explains 32 bit. -- |
|
OnlineCop
Member #7,919
October 2006
|
Thanks all!
|
|
|
1
2
|