Allegro.cc - Online Community

Allegro.cc Forums » Programming Questions » Preallocating arrays: specified size or next-power-of-two

This thread is locked; no one can reply to it. rss feed Print
 1   2 
Preallocating arrays: specified size or next-power-of-two
OnlineCop
Member #7,919
October 2006
avatar

For support of a game map, I'm creating a struct which contains an array of objects, its size, and the capacity.

(This is just the code I'm using, if you need to see it.)

struct s_bound
{
   short left,right,top,bottom;
   short btile;
};


struct s_bound_array
{
   unsigned int size;     // Number of items actually used
   unsigned int capacity; // Number of (pre)allocated items
   s_bound* array;        // Array of s_bound objects itself
};

To add another s_bound object to the array, I use the following function:

#SelectExpand
1int add_bound_item (s_bound_array *which_array, s_bound *which_item) 2{ 3 assert (which_array && "which_array == NULL"); 4 assert (which_item && "which_item == NULL"); 5 6 if (which_array->size >= which_array->capacity) 7 { 8 which_array->array = realloc (which_array->array, which_array->capacity * 2 * sizeof (s_bound)); 9 assert (which_array->array && "Could not realloc!"); 10 if (which_array->array == 0) // NULL 11 return 0; 12 13 which_array->capacity *= 2; 14 } 15 16 which_array->array[which_array->size].left = which_item->left; 17 which_array->array[which_array->size].right = which_item->right; 18 which_array->array[which_array->size].top = which_item->top; 19 which_array->array[which_array->size].bottom = which_item->bottom; 20 which_array->array[which_array->size].btile = which_item->btile; 21 22 ++(which_array->size); 23 24 return 1; 25}

Works well enough, and it doubles the capacity each time a new object is added and there is not enough memory allocated to handle it, much like C++'s std::vector is implemented.

I'm going to add a new function to preallocate the memory. But my question is, should I allocate just the amount specified by the user (in this case, the map itself indicates the number of objects it needs to read in from a file), or should I bump up that amount to the next-power-of-two for data alignment?

Say that the map has stored 129 objects. Calling the add_bound_item() 129 times will need to reallocate the memory about 8 times, and result in a total capacity of 256. If I preallocate the memory at the beginning, though, I only allocate memory once.

Now, do I allocate 129 or 256 objects (the next-power-of-two)? I have to also consider that the game's script (Lua files) may add/remove any of these bounds as well: if I have capacity set to exactly 129, and add one more to it, the add_bound_item() function above states that the capacity should double, giving 258 (not 256). Not a power-of-two boundary.

Suggestions?

Indeterminatus
Member #737
November 2000
avatar

Look at std::vector. I bet they've tackled the same problem and found the best heuristics.

_______________________________
Indeterminatus. [Atomic Butcher]
si tacuisses, philosophus mansisses

Arthur Kalliokoski
Second in Command
February 2005
avatar

Are you worried about using too much memory or you're worried about performance side effects? (caching, paging etc.) I'd say allocate by multiples of 4096 and call it a day.

[EDIT]

That's if you allocated the start of the block on a 4096 byte boundary.

They all watch too much MSNBC... they get ideas.

anonymous
Member #8025
November 2006

The implementations of vector I know of don't seem to tackle the "problem" at all (reserve basically reserves as much as you ask for).

How often do you expect to run into the worst case scenario? For example, if you reserve 129 and add one later, it means your initial reservation was based on a guess. Why not guess a power of two instead (128)?

Indeterminatus
Member #737
November 2000
avatar

Alright, I would've thought most STL implementations would act a bit more intelligently, but I stand corrected. Principle stays the same, albeit implementation changes: look at Java's ArrayList.

_______________________________
Indeterminatus. [Atomic Butcher]
si tacuisses, philosophus mansisses

Oscar Giner
Member #2,207
April 2002
avatar

The size of the array being a power of two or not doesn't have any kind of effect in performance.

Data alignent impacts performance on a per element basis, so in your case each element of the array should be aligned.

So what's important is that the size of the s_bound struct is a multiple of 4 bytes for a 32 bit app, or 8 bytes for a 64 bit app (something that the compiler may already be doing for you, adding padding to the struct).

Tobias Dammers
Member #2,604
August 2002
avatar

What Oscar said.
If you are in a situation where you know exactly how many elements you are going to store, just allocate enough space to accommodate them, and nothing more.
For dynamic allocation, as provided by std::vector, the most efficient strategy is to double the size whenever the capacity is exceeded. You still don't have to use a power-of-two size, and in fact, a std::vector that is initialized to have a capacity of, say, 3 elements, will probably resize to 6, 12, 24, 48, and so on.

To take care of alignment, you (or the compiler) should make sure individual elements have a size that is a multiple of the platform's native register size, i.e. 4 bytes for 32-bit systems and 8 bytes for 64-bit ones. Chances are your compiler does this automatically for you, depending on compiler options and #pragma directives.

---
Me make music: Triofobie
---
"We need Tobias and his awesome trombone, too." - Johan Halmén

anonymous
Member #8025
November 2006

For dynamic allocation, as provided by std::vector, the most efficient strategy is to double the size whenever the capacity is exceeded.

If I'm not mistaken, VC++ uses a factor closer to 1.7. I think the rationale was that this way it is easier to reuse memory blocks released by the vector.

Thomas Fjellstrom
Member #476
June 2000
avatar

Allocating to the next power of two, or doubling the size, does save some time and new/realloc calls. But the actual size doesn't really matter.

--
Thomas Fjellstrom - [website] - [email] - [Allegro Wiki] - [Allegro TODO]
"If you can't think of a better solution, don't try to make a better solution." -- weapon_S
"The less evidence we have for what we believe is certain, the more violently we defend beliefs against those who don't agree" -- https://twitter.com/neiltyson/status/592870205409353730

OnlineCop
Member #7,919
October 2006
avatar

I was thinking that, within the add_bound_item() function, I could bump up the next item to the next-power-of-two.

So if they specify a value of 129 to preallocate, it will give exactly 129. But as soon as they want to add another item, it will bump up to 256 (and not 258). Of course, this savings isn't as obvious if they have 126 and add one more: they only get 128 instead of 252.

In order to align the s_bound struct, do I just do that with a preprocessor macro? The 5 shorts now equal 10 bytes: I would need 2 more bytes (1 int) to give me a 4-byte alignment for 32-bit systems, and 6 more bytes (3 ints) for an 8-byte alignment for 64-bit systems. Is that correct?

Arthur Kalliokoski
Second in Command
February 2005
avatar

It's generally best if you manually sort the largest items to smallest items in a struct. If you put the char stuff[3] first, then there'll be padding before the next member.

They all watch too much MSNBC... they get ideas.

OnlineCop
Member #7,919
October 2006
avatar

Arthur: explain?

struct s_bound
{
char alignment[3];
short left,right,top,bottom; short btile; };

or

struct s_bound
{
   short left,right,top,bottom;
   short btile;
char alignment[3];
};

???

Arthur Kalliokoski
Second in Command
February 2005
avatar

Well, I'll be damned, it used to make a difference! I'll try again once I sober up.

#SelectExpand
1#include <stdio.h> 2 3struct s_bound 4{ 5 char align[5]; 6 short shorty; 7 long longy; 8 double dubble; 9}s_bound1; 10 11struct s_bound_two 12{ 13 double dubble; 14 long longy; 15 short shorty; 16 char align[5]; 17}s_bound2; 18 19int main(void) 20{ 21 printf("Size of s_bound 1 is %d\n",sizeof(s_bound1)); 22 printf("Size of s_bound 2 is %d\n",sizeof(s_bound2)); 23 return 0; 24}

They all watch too much MSNBC... they get ideas.

Thomas Fjellstrom
Member #476
June 2000
avatar

padding and alignment can differ on platform and compiler, but in general each short will be aligned to 4 byte boundaries (might be 8 on 64bit native systems), and your char will also start at a 4-8 byte boundary. In the first version you get space between the three chars and the next short, the second one you don't, the struct just ends.

--
Thomas Fjellstrom - [website] - [email] - [Allegro Wiki] - [Allegro TODO]
"If you can't think of a better solution, don't try to make a better solution." -- weapon_S
"The less evidence we have for what we believe is certain, the more violently we defend beliefs against those who don't agree" -- https://twitter.com/neiltyson/status/592870205409353730

OnlineCop
Member #7,919
October 2006
avatar

Arthur: Rearrange that a bit to see:

#SelectExpand
1#include <stdio.h> 2 3struct s_bound 4{
5 char align[5];
6 short shorty; 7 long longy; 8 double dubble; 9}s_bound1; 10 11struct s_bound_two 12{ 13 short shorty; 14 long longy; 15 double dubble;
16 char align[5];
17}s_bound2; 18 19int main(void) 20{ 21 printf("Size of s_bound 1 is %d\n",sizeof(s_bound1)); 22 printf("Size of s_bound 2 is %d\n",sizeof(s_bound2)); 23 return 0; 24}

Result:

  Size of s_bound 1 is 20
  Size of s_bound 2 is 24

When I look at Wikipedia and do a test run on my Mac (32-bit, I believe), this is what I see:

#include <stdio.h>

struct s_bound
{
  short left, right, top, bottom;
  short btile;
} s_bound1;

int main(void)
{
  printf("Size of s_bound 1 is %d\n",sizeof(s_bound1));
  return 0;
}

Output:

  Size of s_bound 1 is 10

But will this be the case for 64-bit platforms, or even DIFFERENT 32-bit platforms?

EDIT: Did you rearrange that after you posted? I thought your original had identical structures..?

Arthur Kalliokoski
Second in Command
February 2005
avatar

I'm running on linux 64 for the second week now (utter noob) and I don't feel like rebooting to linux 32.

[EDIT]

And I've drank 9 beers now, I'll log off til tomorrow.

They all watch too much MSNBC... they get ideas.

Dario ff
Member #10,065
August 2008
avatar

OnlineCop said:

Did you rearrange that after you posted? I thought your original had identical structures..?

He did, I saw the structures identical and didn't understand what was Arthur's point.

TranslatorHack 2010, a human translation chain in a.cc.
My games: [GiftCraft] - [Blocky Rhythm[SH2011]] - [Elven Revolution] - [Dune Smasher!]

Arthur Kalliokoski
Second in Command
February 2005
avatar

dario ff said:

He did, I saw the structures identical and didn't understand what was Arthur's point.

Yeah I did, and saw it, and recompiled with the new structs and it still said 24 on linux 64

Quote:

Exercise caution in your daily affairs.

pepsi@fractal:~$ align
Size of s_bound 1 is 24
Size of s_bound 2 is 24
pepsi@fractal:~$

Now I'll log off again and lurk

They all watch too much MSNBC... they get ideas.

Oscar Giner
Member #2,207
April 2002
avatar

padding and alignment can differ on platform and compiler, but in general each short will be aligned to 4 byte boundaries (might be 8 on 64bit native systems), and your char will also start at a 4-8 byte boundary.

Not really. Shorts are aligned to 2 byte boundaries, and chars to 1 byte boundaries (so not really alligned :P).

So

struct s_bound
  {
    char align[5]; 
    short shorty;
    long longy;
    double dubble;
  }s_bound1;

is going to be alligned like this:

        1           2            3            4
00   align[0]  |  align[1]  |  align[2]  |  align[3]
04   align[4]  |  padding   |          shorty
08                        longy
12                        dubble
16                        dubble

(that's 20 bytes)

while

struct s_bound_two
 {
   short shorty;
   long longy;
   double dubble;
   char align[5]; 
 }s_bound2;

will be alligned this way:

        1           2            3            4
00          shorty         | ------ padding -------
04                       longy
08                       dubble
12                       dubble
16  align[0]  |  align[1]  |  align[2]  |  align[3]
20  align[4]  |  ------------ padding -------------

(that's 24 bytes)

This in a 32 bit system.

[edit]
Arthur : can you try this one:

struct s_bound
  {
    char align[5]; 
    short shorty;
    double dubble;
    long longy;
  }s_bound1;

Doubles may be 8 byte alligned, so in the first case you'd get 24 bytes instead of 20.

Arthur Kalliokoski
Second in Command
February 2005
avatar

Nope, still says 24

[EDIT]

reboot to 32 bit, brb

32 bit gcc says sizeof is all 20

They all watch too much MSNBC... they get ideas.

Thomas Fjellstrom
Member #476
June 2000
avatar

Not really. Shorts are aligned to 2 byte boundaries, and chars to 1 byte boundaries (so not really alligned :P).

Maybe in an array. In structs things are generally aligned to the native "word" size. So two single chars would be aligned and padded to separate "words". x86 does work with unaligned accesses, but last I heard, it is slower than unaligned accesses, so you generally want it to align, even if it can waste 3 or 7 bytes per char.

--
Thomas Fjellstrom - [website] - [email] - [Allegro Wiki] - [Allegro TODO]
"If you can't think of a better solution, don't try to make a better solution." -- weapon_S
"The less evidence we have for what we believe is certain, the more violently we defend beliefs against those who don't agree" -- https://twitter.com/neiltyson/status/592870205409353730

Arthur Kalliokoski
Second in Command
February 2005
avatar

The thing with the unaligned accesses is that even if some variable fits into a register cleanly, the memory bus still has to do multiple accesses if the variable in memory spans the native size.

They all watch too much MSNBC... they get ideas.

Thomas Fjellstrom
Member #476
June 2000
avatar

The thing with the unaligned accesses is that even if some variable fits into a register cleanly, the memory bus still has to do multiple accesses if the variable in memory spans the native size.

Not the only problem. Accessing even a char unaligned may take some extra operations.

--
Thomas Fjellstrom - [website] - [email] - [Allegro Wiki] - [Allegro TODO]
"If you can't think of a better solution, don't try to make a better solution." -- weapon_S
"The less evidence we have for what we believe is certain, the more violently we defend beliefs against those who don't agree" -- https://twitter.com/neiltyson/status/592870205409353730

Arthur Kalliokoski
Second in Command
February 2005
avatar

Please explain what an unaligned char is, please.

They all watch too much MSNBC... they get ideas.

Oscar Giner
Member #2,207
April 2002
avatar

Thomas: what you don't want is having a variable spawning over several "lines" of memory, because that'd mean that for reading 1 variable you'd need 2 memory accesses.

This is what you don't want:

@    00   01   02   03   04   05   06   07
             |       looong      |

While accessing looong in the x86 is possible (other cpu's don't allow it), it requires 2 accesses (one for retrieving the first word, 00-03, and another to retrieve the second word, 04-07, and then joining the correct bytes). So there's a big performance hit, thus why the alignment.

If you have this:

@    00   01   02   03
      sshort | sshort  |

is ok, and nice since you can read both shorts with just one memory access (after loading you'll need to separate them in 2 different registers, but that's a very fast operation, compared to having 2 memory accesses if you aligned the shorts to word boundaries).

So, google a bit about this. You'll find it's as I said.

 1   2 


Go to: