|
This thread is locked; no one can reply to it. |
1
2
|
Preallocating arrays: specified size or next-power-of-two |
OnlineCop
Member #7,919
October 2006
|
For support of a game map, I'm creating a struct which contains an array of objects, its size, and the capacity. (This is just the code I'm using, if you need to see it.)
struct s_bound { short left,right,top,bottom; short btile; }; struct s_bound_array { unsigned int size; // Number of items actually used unsigned int capacity; // Number of (pre)allocated items s_bound* array; // Array of s_bound objects itself }; To add another s_bound object to the array, I use the following function: 1int add_bound_item (s_bound_array *which_array, s_bound *which_item)
2{
3 assert (which_array && "which_array == NULL");
4 assert (which_item && "which_item == NULL");
5
6 if (which_array->size >= which_array->capacity)
7 {
8 which_array->array = realloc (which_array->array, which_array->capacity * 2 * sizeof (s_bound));
9 assert (which_array->array && "Could not realloc!");
10 if (which_array->array == 0) // NULL
11 return 0;
12
13 which_array->capacity *= 2;
14 }
15
16 which_array->array[which_array->size].left = which_item->left;
17 which_array->array[which_array->size].right = which_item->right;
18 which_array->array[which_array->size].top = which_item->top;
19 which_array->array[which_array->size].bottom = which_item->bottom;
20 which_array->array[which_array->size].btile = which_item->btile;
21
22 ++(which_array->size);
23
24 return 1;
25}
Works well enough, and it doubles the capacity each time a new object is added and there is not enough memory allocated to handle it, much like C++'s std::vector is implemented. I'm going to add a new function to preallocate the memory. But my question is, should I allocate just the amount specified by the user (in this case, the map itself indicates the number of objects it needs to read in from a file), or should I bump up that amount to the next-power-of-two for data alignment? Say that the map has stored 129 objects. Calling the add_bound_item() 129 times will need to reallocate the memory about 8 times, and result in a total capacity of 256. If I preallocate the memory at the beginning, though, I only allocate memory once. Now, do I allocate 129 or 256 objects (the next-power-of-two)? I have to also consider that the game's script (Lua files) may add/remove any of these bounds as well: if I have capacity set to exactly 129, and add one more to it, the add_bound_item() function above states that the capacity should double, giving 258 (not 256). Not a power-of-two boundary. Suggestions?
|
Indeterminatus
Member #737
November 2000
|
Look at std::vector. I bet they've tackled the same problem and found the best heuristics. _______________________________ |
Arthur Kalliokoski
Second in Command
February 2005
|
Are you worried about using too much memory or you're worried about performance side effects? (caching, paging etc.) I'd say allocate by multiples of 4096 and call it a day. [EDIT] That's if you allocated the start of the block on a 4096 byte boundary. They all watch too much MSNBC... they get ideas. |
anonymous
Member #8025
November 2006
|
The implementations of vector I know of don't seem to tackle the "problem" at all (reserve basically reserves as much as you ask for). How often do you expect to run into the worst case scenario? For example, if you reserve 129 and add one later, it means your initial reservation was based on a guess. Why not guess a power of two instead (128)? |
Indeterminatus
Member #737
November 2000
|
Alright, I would've thought most STL implementations would act a bit more intelligently, but I stand corrected. Principle stays the same, albeit implementation changes: look at Java's ArrayList. _______________________________ |
Oscar Giner
Member #2,207
April 2002
|
The size of the array being a power of two or not doesn't have any kind of effect in performance. Data alignent impacts performance on a per element basis, so in your case each element of the array should be aligned. So what's important is that the size of the s_bound struct is a multiple of 4 bytes for a 32 bit app, or 8 bytes for a 64 bit app (something that the compiler may already be doing for you, adding padding to the struct). -- |
Tobias Dammers
Member #2,604
August 2002
|
What Oscar said. To take care of alignment, you (or the compiler) should make sure individual elements have a size that is a multiple of the platform's native register size, i.e. 4 bytes for 32-bit systems and 8 bytes for 64-bit ones. Chances are your compiler does this automatically for you, depending on compiler options and #pragma directives. --- |
anonymous
Member #8025
November 2006
|
Tobias Dammers said: For dynamic allocation, as provided by std::vector, the most efficient strategy is to double the size whenever the capacity is exceeded. If I'm not mistaken, VC++ uses a factor closer to 1.7. I think the rationale was that this way it is easier to reuse memory blocks released by the vector. |
Thomas Fjellstrom
Member #476
June 2000
|
Allocating to the next power of two, or doubling the size, does save some time and new/realloc calls. But the actual size doesn't really matter. -- |
OnlineCop
Member #7,919
October 2006
|
I was thinking that, within the add_bound_item() function, I could bump up the next item to the next-power-of-two. So if they specify a value of 129 to preallocate, it will give exactly 129. But as soon as they want to add another item, it will bump up to 256 (and not 258). Of course, this savings isn't as obvious if they have 126 and add one more: they only get 128 instead of 252. In order to align the s_bound struct, do I just do that with a preprocessor macro? The 5 shorts now equal 10 bytes: I would need 2 more bytes (1 int) to give me a 4-byte alignment for 32-bit systems, and 6 more bytes (3 ints) for an 8-byte alignment for 64-bit systems. Is that correct?
|
Arthur Kalliokoski
Second in Command
February 2005
|
It's generally best if you manually sort the largest items to smallest items in a struct. If you put the char stuff[3] first, then there'll be padding before the next member. They all watch too much MSNBC... they get ideas. |
OnlineCop
Member #7,919
October 2006
|
Arthur: explain? struct s_bound { or struct s_bound { short left,right,top,bottom; short btile;
|
Arthur Kalliokoski
Second in Command
February 2005
|
Well, I'll be damned, it used to make a difference! I'll try again once I sober up. 1#include <stdio.h>
2
3struct s_bound
4{
5 char align[5];
6 short shorty;
7 long longy;
8 double dubble;
9}s_bound1;
10
11struct s_bound_two
12{
13 double dubble;
14 long longy;
15 short shorty;
16 char align[5];
17}s_bound2;
18
19int main(void)
20{
21 printf("Size of s_bound 1 is %d\n",sizeof(s_bound1));
22 printf("Size of s_bound 2 is %d\n",sizeof(s_bound2));
23 return 0;
24}
They all watch too much MSNBC... they get ideas. |
Thomas Fjellstrom
Member #476
June 2000
|
padding and alignment can differ on platform and compiler, but in general each short will be aligned to 4 byte boundaries (might be 8 on 64bit native systems), and your char will also start at a 4-8 byte boundary. In the first version you get space between the three chars and the next short, the second one you don't, the struct just ends. -- |
OnlineCop
Member #7,919
October 2006
|
Arthur: Rearrange that a bit to see: 1#include <stdio.h>
2
3struct s_bound
4{
5 char align[5]; 6 short shorty;
7 long longy;
8 double dubble;
9}s_bound1;
10
11struct s_bound_two
12{
13 short shorty;
14 long longy;
15 double dubble;
16 char align[5]; 17}s_bound2;
18
19int main(void)
20{
21 printf("Size of s_bound 1 is %d\n",sizeof(s_bound1));
22 printf("Size of s_bound 2 is %d\n",sizeof(s_bound2));
23 return 0;
24}
Result: Size of s_bound 1 is 20 Size of s_bound 2 is 24 When I look at Wikipedia and do a test run on my Mac (32-bit, I believe), this is what I see: #include <stdio.h> struct s_bound { short left, right, top, bottom; short btile; } s_bound1; int main(void) { printf("Size of s_bound 1 is %d\n",sizeof(s_bound1)); return 0; } Output: Size of s_bound 1 is 10 But will this be the case for 64-bit platforms, or even DIFFERENT 32-bit platforms? EDIT: Did you rearrange that after you posted? I thought your original had identical structures..?
|
Arthur Kalliokoski
Second in Command
February 2005
|
I'm running on linux 64 for the second week now (utter noob) and I don't feel like rebooting to linux 32. [EDIT] And I've drank 9 beers now, I'll log off til tomorrow. They all watch too much MSNBC... they get ideas. |
Dario ff
Member #10,065
August 2008
|
OnlineCop said: Did you rearrange that after you posted? I thought your original had identical structures..? He did, I saw the structures identical and didn't understand what was Arthur's point. TranslatorHack 2010, a human translation chain in a.cc. |
Arthur Kalliokoski
Second in Command
February 2005
|
dario ff said: He did, I saw the structures identical and didn't understand what was Arthur's point. Yeah I did, and saw it, and recompiled with the new structs and it still said 24 on linux 64 Quote: Exercise caution in your daily affairs. pepsi@fractal:~$ align Now I'll log off again and lurk They all watch too much MSNBC... they get ideas. |
Oscar Giner
Member #2,207
April 2002
|
Thomas Fjellstrom said: padding and alignment can differ on platform and compiler, but in general each short will be aligned to 4 byte boundaries (might be 8 on 64bit native systems), and your char will also start at a 4-8 byte boundary. Not really. Shorts are aligned to 2 byte boundaries, and chars to 1 byte boundaries (so not really alligned ). So struct s_bound { char align[5]; short shorty; long longy; double dubble; }s_bound1; is going to be alligned like this: 1 2 3 4 00 align[0] | align[1] | align[2] | align[3] 04 align[4] | padding | shorty 08 longy 12 dubble 16 dubble (that's 20 bytes) while struct s_bound_two { short shorty; long longy; double dubble; char align[5]; }s_bound2; will be alligned this way: 1 2 3 4 00 shorty | ------ padding ------- 04 longy 08 dubble 12 dubble 16 align[0] | align[1] | align[2] | align[3] 20 align[4] | ------------ padding ------------- (that's 24 bytes) This in a 32 bit system. [edit] struct s_bound { char align[5]; short shorty; double dubble; long longy; }s_bound1; Doubles may be 8 byte alligned, so in the first case you'd get 24 bytes instead of 20. -- |
Arthur Kalliokoski
Second in Command
February 2005
|
Nope, still says 24 [EDIT] reboot to 32 bit, brb 32 bit gcc says sizeof is all 20 They all watch too much MSNBC... they get ideas. |
Thomas Fjellstrom
Member #476
June 2000
|
Oscar Giner said: Not really. Shorts are aligned to 2 byte boundaries, and chars to 1 byte boundaries (so not really alligned ). Maybe in an array. In structs things are generally aligned to the native "word" size. So two single chars would be aligned and padded to separate "words". x86 does work with unaligned accesses, but last I heard, it is slower than unaligned accesses, so you generally want it to align, even if it can waste 3 or 7 bytes per char. -- |
Arthur Kalliokoski
Second in Command
February 2005
|
The thing with the unaligned accesses is that even if some variable fits into a register cleanly, the memory bus still has to do multiple accesses if the variable in memory spans the native size. They all watch too much MSNBC... they get ideas. |
Thomas Fjellstrom
Member #476
June 2000
|
Arthur Kalliokoski said: The thing with the unaligned accesses is that even if some variable fits into a register cleanly, the memory bus still has to do multiple accesses if the variable in memory spans the native size. Not the only problem. Accessing even a char unaligned may take some extra operations. -- |
Arthur Kalliokoski
Second in Command
February 2005
|
Please explain what an unaligned char is, please. They all watch too much MSNBC... they get ideas. |
Oscar Giner
Member #2,207
April 2002
|
Thomas: what you don't want is having a variable spawning over several "lines" of memory, because that'd mean that for reading 1 variable you'd need 2 memory accesses. This is what you don't want: @ 00 01 02 03 04 05 06 07 | looong | While accessing looong in the x86 is possible (other cpu's don't allow it), it requires 2 accesses (one for retrieving the first word, 00-03, and another to retrieve the second word, 04-07, and then joining the correct bytes). So there's a big performance hit, thus why the alignment. If you have this: @ 00 01 02 03 sshort | sshort | is ok, and nice since you can read both shorts with just one memory access (after loading you'll need to separate them in 2 different registers, but that's a very fast operation, compared to having 2 memory accesses if you aligned the shorts to word boundaries). So, google a bit about this. You'll find it's as I said. -- |
|
1
2
|