Allegro.cc - Online Community

Allegro.cc Forums » Programming Questions » Preallocating arrays: specified size or next-power-of-two

This thread is locked; no one can reply to it. rss feed Print
 1   2 
Preallocating arrays: specified size or next-power-of-two
Arthur Kalliokoski
Second in Command
February 2005
avatar

http://en.wikipedia.org/wiki/Data_structure_alignment#Typical_alignment_of_C_structs_on_x86

[EDIT]
I didn't see Oscar's post before I posted this, what he means is:

The memory management stuff in the computer accesses some certain size all the time,
assume 32 bit bus.

Access a char (byte)
Nxxx xxxx
xNxx xxxx //All fit within a 32 bit access
xxNx xxxx
xxxN xxxx
xxxx Nxxx...

Access a 16 bit int
NNxx xxxx
xNNx xxxx
xxxN Nxxx <-crosses 32 bit boundary
xxxx NNxx...

same for 32 bit var
NNNN xxxx
xNNN Nxxx <-crosses
  .
  .
  .
xxxN NNNx

xxxx NNNN <-aligned again

They all watch too much MSNBC... they get ideas.

Thomas Fjellstrom
Member #476
June 2000
avatar

Please explain what an unaligned char is, please.

Any access that doesn't start at an aligned address, just because the amount of data is smaller doesn't make it ok (its just not as bad).

Thomas: what you don't want is having a variable spawning over several "lines" of memory, because that'd mean that for reading 1 variable you'd need 2 memory accesses.

I'm not disagreeing. Just that a single char read, not at an aligned boundary isn't as fast as one that is aligned. Some CPUs can't even access data thats unaligned without doing a larger read that is aligned, and then throwing away the bits it doesn't want. Just because the IA32 and x86_64 arch can do unaligned accesses, doesn't mean its as efficient as aligned accesses.

--
Thomas Fjellstrom - [website] - [email] - [Allegro Wiki] - [Allegro TODO]
"If you can't think of a better solution, don't try to make a better solution." -- weapon_S
"The less evidence we have for what we believe is certain, the more violently we defend beliefs against those who don't agree" -- https://twitter.com/neiltyson/status/592870205409353730

Oscar Giner
Member #2,207
April 2002
avatar

I'm not disagreeing. Just that a single char read, not at an aligned boundary isn't as fast as one that is aligned.

Care to post your source? Because that's not true. Memory must be accessed word by word (that's called a line), it's not possible to read just one byte (well, unless you're working with an 8 bit cpu where a word is 1 byte). It doesn't matter where in the word the byte you want is, since the cpu must retrieve the entire word no matter what.

I think you haven't even cared to read that wikipedia article because it explains how data is aligned pretty cleanly :/

Arthur Kalliokoski
Second in Command
February 2005
avatar

Maybe there's a situation where reading a char into AL register is faster if it's on the low byte (little endian)?

They all watch too much MSNBC... they get ideas.

Thomas Fjellstrom
Member #476
June 2000
avatar

Care to post your source?

Just something I've learned reading about how CPUs work.

Quote:

Memory must be accessed word by word (that's called a line)

Uh, a "line" is generally larger, and depends on the CPU architecture, at least if you're talking about the cache line. My Phenom II 810 has a L3 line size of 64 bytes (not bits, 64 bits is the word size).

Quote:

I think you haven't even cared to read that wikipedia article because it explains how data is aligned pretty cleanly :/

Yeah, I didn't read it. I've skimmed it. But I can see how it can help to group and pack individual chars, so it fetches all 4 at the same time, saving some operations. Of course thats only if you group all your chars in the struct into 4-8 in a row groupings. Space them out a bunch, and its pretty inefficient (on x86/x86-64).

And as I mentioned, some cpu's CAN'T access unaligned data at all. All you get is an exception, so if you attempt to access one of those packed chars, you have to fetch an entire word manually, and throw away the bits you don't want, which is what x86 does for you anyhow.

--
Thomas Fjellstrom - [website] - [email] - [Allegro Wiki] - [Allegro TODO]
"If you can't think of a better solution, don't try to make a better solution." -- weapon_S
"The less evidence we have for what we believe is certain, the more violently we defend beliefs against those who don't agree" -- https://twitter.com/neiltyson/status/592870205409353730

OnlineCop
Member #7,919
October 2006
avatar

Not to rerail this topic or anything, but what should I do for this code?

struct s_bound
   short left,right,top,bottom,btile; // 5 shorts, no change
};

...

struct s_bound
{
   short left,right,top,bottom,btile;
   short unused; // 1 short to give 6-shorts (or 3 ints) alignment
};

...

struct s_bound
{
   short left,right,top,bottom,btile;
   short unused[3]; // 3 shorts to give 8-shorts (or 4 ints) alignment
};

I don't know whether this code will be run on 32- or 64-bit architecture. Or both. So would it be better to waste space and try to align the data on a 64-bit model, or pad it for a 32-bit model?

Or write a preprocessor macro which would fill in the appropriate number depending?

#ifdef 64_bit_architecture
# define PADDING(bytes_used) char unused[8-(bytes_used)]
#else
# define PADDING(bytes_used) char unused[4-(bytes_used)]
#endif

struct s_bound
{
   short left,right,top,bottom,btile;
   PADDING(6); // 6 bytes, or 3 shorts, to give a total of 8 shorts (4 ints)
};

Arthur Kalliokoski
Second in Command
February 2005
avatar

I'd say all of those would be packed tightly as they are, since they're not a hodgepodge of sizes.

They all watch too much MSNBC... they get ideas.

Indeterminatus
Member #737
November 2000
avatar

OnlineCop said:

Not to rerail this topic or anything, but what should I do for this code?

I wouldn't worry about (manual) padding at all. This makes sense only in very tight environments, and there you know the target architecture exactly. Chances are high you get it right for one setting and make it worse for another, so just let the tool-chain do its job and let it decide that for you.

_______________________________
Indeterminatus. [Atomic Butcher]
si tacuisses, philosophus mansisses

Oscar Giner
Member #2,207
April 2002
avatar

Thomas: when did I mentioned anything about cache? I think what you got wrong is the definition of data alignment. For starters, by definition, a byte is always aligned; there's no such thing as a misaligned byte. If you don't want to trust me or that wikipedia article, I'll just copy here a paragrahp of the IA32 Intel Architecture Software Developer Manual book:

(note: it's using the intel asm terminology for naming data types, so a word means 2 bytes, doubleword 4 bytes and quadword 8 bytes)

Quote:

Words, doublewords, and quadwords do not need to be aligned in memory on natural boundaries. (The natural boundaries for words, double words, and quadwords are even-numbered addresses, addresses evenly divisible by four, and addresses evenly divisible by eight, respectively.) However, to improve performance of programs, data structures (especially stacks) should be aligned on natural boundaries whenever possible.

See how it doesn't mention bytes anywhere, and words (shorts in our discussion) should just be aligned to even boundaries (not necessarily to addresses divisible by 4 as you say)

PD: you should first document yourself properly. I hate when someone discusses about something he doesn't really know and doesn't even seem interested at all about learning the truth. I've studied this at university, and I've worked with a Sparc CPU (yes one of those that don't allow misaligned memory accesses).

Thomas Fjellstrom
Member #476
June 2000
avatar

See how it doesn't mention bytes anywhere, and words (shorts in our discussion) should just be aligned to even boundaries (not necessarily to addresses divisible by 4 as you say)

Again, as I mentioned, it works fine. But it doesn't mean its as efficient as aligning to a native word (ie: 4 or 8 bytes). Also as I mentioned, some CPUs just don't support accessing data not aligned to a native word size.

Quote:

when did I mentioned anything about cache?

Its the only "line size" I know about in regards to CPUs.

--
Thomas Fjellstrom - [website] - [email] - [Allegro Wiki] - [Allegro TODO]
"If you can't think of a better solution, don't try to make a better solution." -- weapon_S
"The less evidence we have for what we believe is certain, the more violently we defend beliefs against those who don't agree" -- https://twitter.com/neiltyson/status/592870205409353730

Arthur Kalliokoski
Second in Command
February 2005
avatar

http://en.wikipedia.org/wiki/Conventional_PCI#Conventional_hardware_specifications
(old skool PCI)
says PCI bus width is 32 bits.

http://en.wikipedia.org/wiki/PCI_Express
(since 2004)
has a sidebar saying "width in bits: 1 - 32"
but also says data transmission is serial :o

I've read elsewhere that past Pentium I (?) cache line size is always 64 bytes. It doesn't "match" the motherboard bus size, but so what? Neither does HDD sector or cluster size.

OTOH, my mobo is 64 bit, which implies at least 64 bit bus (I hope), but I was under the impression it's not PCI express (not mentioned in manual) and there's an AGP video slot.

They all watch too much MSNBC... they get ideas.

Oscar Giner
Member #2,207
April 2002
avatar

Why would it make it less efficient? Maybe you could provide some source? Or you're just guessing? That text clearly states that, for example in case of shorts, they only need to be aligned to even addresses, and you seem to skip that and still say they should be aligned to a word size (address divisible by 4). Anyway I looked at the Optimization Reference Manual, and found a listing for best alignaments, and in this one yes it explictly states that bytes can be at any address without performance penalties.

some CPUs just don't support accessing data not aligned to a native word size.

Tell me an example of this. I don't know of any CPU that has such alignment restrictions. The restrictions some CPU's have is that data must be aligned to it's size: shorts must be aligned to even addresses, longs to addresses multiple of 4, doubles or long longs to addresses multiple of 8 and so on.

From Sun Sparc documentation (64 bit CPU, so a word is 64 bits):

Quote:

All quantities must be aligned on their natural boundaries, using standard C data types:

short integers are aligned on 16-bit boundaries.

int integers are aligned on 32-bit boundaries.

long integers are aligned on 64-bit boundaries for SPARC systems. For information on data models, see Appendix C, Making a Device Driver 64-Bit Ready.

long long integers are aligned on 64-bit boundaries.

As you can see shorts and ints are not required to be aligned to 64 bits, just 16 and 32 respectively.

Quote:

For example, a structure containing only characters has no alignment restrictions

This last quote is to clearly show that bytes don't have any alignment restriction.

And I'm finished here if you're not able to provide sources for your claims, since this is getting very stupid.

Thomas Fjellstrom
Member #476
June 2000
avatar

You are right it is getting pretty stupid. I can't find any clear references, mind you I'm not trying very hard. But I still believe that even accessing a single byte that isn't at an aligned location will be slower, at least on some cpus. For a long time intel at least has been implementing some older operations solely in microcode stored in some hidden sram on the die. Though I'm not convinced opcodes that would take byte arguments would be a very wise candidate (not that thats stopped intel and other cpu companies from doing un-wise things) for microcode.

I am wondering weather or not you're taking this a little too personally though.

It is possible I'm mixing up the full size variable alignment restrictions. I do know that many ARM architectures can not access an unaligned 16 or 32 bit variable. They just throw exceptions. And I may have just extended that to any unaligned data. If they can access unaligned bytes, it wouldn't take much more to extend that to unaligned shorts imo.

--
Thomas Fjellstrom - [website] - [email] - [Allegro Wiki] - [Allegro TODO]
"If you can't think of a better solution, don't try to make a better solution." -- weapon_S
"The less evidence we have for what we believe is certain, the more violently we defend beliefs against those who don't agree" -- https://twitter.com/neiltyson/status/592870205409353730

Arthur Kalliokoski
Second in Command
February 2005
avatar

I do know that many ARM architectures can not access an unaligned 16 or 32 bit variable. They just throw exceptions.

Even a 486 will do that if you set the correct (protected) bit in the flags register.

They all watch too much MSNBC... they get ideas.

Oscar Giner
Member #2,207
April 2002
avatar

This is the best I could find: http://webster.cs.ucr.edu/AoA/Windows/HTML/SystemOrganizationa2.html

Skip to the middle of the page, when it starts talking about 16 bit data buses, later it also explains 32 bit.

OnlineCop
Member #7,919
October 2006
avatar

Thanks all!

 1   2 


Go to: