Allegro.cc - Online Community

Allegro.cc Forums » Off-Topic Ordeals » Don't make my mistake!

This thread is locked; no one can reply to it. rss feed Print
 1   2 
Don't make my mistake!
Chris Katko
Member #1,881
January 2002
avatar

Luckily, this wasn't a difficult bug to track down because my framework was a prototype and concise.

I've gotten into the bad habit of using std::vector even when a std::list will do. Mistakenly thinking that a vector was valid in all cases a list would be, but only slower. I filed that under leaving optimization to the end of development.

Unfortunately, if you're using an iterator, and you add to a vector? The iterators become invalid. Any time the vector grows beyond it's internal size (which is implementation dependent), it will reallocate a whole new array and copy the old over. New array means completely new addresses that the iterators don't know about. Which makes sense, but that can become encapsulated from your mind if you're not careful!

Here's a relevant stack overflow.

I had a hallway traversal algorithm using std::vector that tracked multiple points on a grid and added points as they found additional branches. This quickly blew up in my face once the hallways became more numerous.

-----sig:
“Programs should be written for people to read, and only incidentally for machines to execute.” - Structure and Interpretation of Computer Programs
"Political Correctness is fascism disguised as manners" --George Carlin

Thomas Fjellstrom
Member #476
June 2000
avatar

You're not supposed to store iterators, and modifying the container while iterating is normally frowned upon. ;)

--
Thomas Fjellstrom - [website] - [email] - [Allegro Wiki] - [Allegro TODO]
"If you can't think of a better solution, don't try to make a better solution." -- weapon_S
"The less evidence we have for what we believe is certain, the more violently we defend beliefs against those who don't agree" -- https://twitter.com/neiltyson/status/592870205409353730

Chris Katko
Member #1,881
January 2002
avatar

I don't store iterators. And I'm not really sure how one is supposed to have a list of containers and not be able to modify them.

-----sig:
“Programs should be written for people to read, and only incidentally for machines to execute.” - Structure and Interpretation of Computer Programs
"Political Correctness is fascism disguised as manners" --George Carlin

Thomas Fjellstrom
Member #476
June 2000
avatar

You can modify them, but not while you're iterating over the entire thing :P

Sure it can be done, but it's a pain in the ass. Some languages actually forbid it. IIRC ObjC stuff will throw an exception if you attempt to modify a container when an iterator is open.

--
Thomas Fjellstrom - [website] - [email] - [Allegro Wiki] - [Allegro TODO]
"If you can't think of a better solution, don't try to make a better solution." -- weapon_S
"The less evidence we have for what we believe is certain, the more violently we defend beliefs against those who don't agree" -- https://twitter.com/neiltyson/status/592870205409353730

Chris Katko
Member #1,881
January 2002
avatar

Wow, I didn't realize that was such an issue.

It seems pretty straight-forward to delete containers you're done with as you pass over them instead of computing the logic for whether they should be deleted in one pass, and actually deleting them in a second pass.

Here's my progress so far, as pictures are so much fun:

{"name":"e8Q4rt9.png","src":"\/\/djungxnpq2nug.cloudfront.net\/image\/cache\/1\/e\/1e6ad0dd62739291ea869175d3085164.png","w":600,"h":570,"tn":"\/\/djungxnpq2nug.cloudfront.net\/image\/cache\/1\/e\/1e6ad0dd62739291ea869175d3085164"}e8Q4rt9.png

The datafield is pressure values for an atmospheric simulation. But the colors aren't pressure values at all. They're basically timestamps. I have an idea for a periodic, pulsing, asynchronous update rate to large data maps so that a CPU isn't destroyed trying to keep up with large map logic.

-----sig:
“Programs should be written for people to read, and only incidentally for machines to execute.” - Structure and Interpretation of Computer Programs
"Political Correctness is fascism disguised as manners" --George Carlin

Thomas Fjellstrom
Member #476
June 2000
avatar

Some libraries with their own containers will provide mutable iterators, or whatever you want to call them. They "remember" the object they were pointing at even if the underlying data changes. Qt is one such thing. It's MVC classes include a QPersistentModelIndex, which does some extra work to allow it's index to not get invalidated by a change in the model. It's a lot of extra work to support in the lib itself, and in any custom Models for that matter.

--
Thomas Fjellstrom - [website] - [email] - [Allegro Wiki] - [Allegro TODO]
"If you can't think of a better solution, don't try to make a better solution." -- weapon_S
"The less evidence we have for what we believe is certain, the more violently we defend beliefs against those who don't agree" -- https://twitter.com/neiltyson/status/592870205409353730

Peter Hull
Member #1,136
March 2001

NZHeadshot had a similar issue recently with deleting while iterating.
https://www.allegro.cc/forums/thread/614476

One idea for adding while iterating is to add to a second collection then, in a second phase, append the whole thing to your first collection. That means one realloc and one memcpy per cycle so it's not too bad.

My mistake is always passing a container by value then wondering why it doesn't get modified. Lost track of the number of times I've done that!

LennyLen
Member #5,313
December 2004
avatar

Lost track of the number of times I've done that!

Probably because you keep passing the variable that stores the number of times by value.

Chris Katko
Member #1,881
January 2002
avatar

NZHeadshot had a similar issue recently with deleting while iterating.

Goodness gracious, why the heck didn't someone just tell him to use a std::list?

std::vector and std::deque are the only exception where you can't do that.


The idiomatic way to write that loop would be...

for(auto i = list.begin(); i != list.end();)
{
if(condition)
i = list.erase(i);
else
++i;
}

You can do the same thing with a set, multiset, map, or multimap. For these containers you can erase an element without affecting the validity to any iterators to other elements. Other containers like vector or deque are not so kind. For those containers only elements before the erased iterator remain untouched. This difference is simply because lists store elements in individually allocated nodes. It's easy to take one link out. vectors are contiguous, taking one element out moves all elements after it back one position.

-----sig:
“Programs should be written for people to read, and only incidentally for machines to execute.” - Structure and Interpretation of Computer Programs
"Political Correctness is fascism disguised as manners" --George Carlin

SiegeLord
Member #7,827
October 2006
avatar

You shouldn't use std::list merely because it simplifies deleting from it.

"For in much wisdom is much grief: and he that increases knowledge increases sorrow."-Ecclesiastes 1:18
[SiegeLord's Abode][Codes]:[DAllegro5]:[RustAllegro]

Chris Katko
Member #1,881
January 2002
avatar

SiegeLord said:

You shouldn't use std::list merely because it simplifies deleting from it.

You should always use a data structure that reflects the costs and benefits of what you intend to do with it, and you shouldn't be bound to one data structure candidate if your algorithms desire another.

What I'm suggesting is he likely wasn't using std::vector because of the benefits of vectors, but merely because he was familiar with it. And a fundamental question of "are you sure you need vectors" is more useful than "let's try to make vectors do what they're not good at."

[edited, you quick replying monsters. :P]

-----sig:
“Programs should be written for people to read, and only incidentally for machines to execute.” - Structure and Interpretation of Computer Programs
"Political Correctness is fascism disguised as manners" --George Carlin

Steve Terry
Member #1,989
March 2002
avatar

I thought you were going to say kids....

___________________________________
[ Facebook ]
Microsoft is not the Borg collective. The Borg collective has got proper networking. - planetspace.de
Bill Gates is in fact Shawn Hargreaves' ßî+çh. - Gideon Weems

SiegeLord
Member #7,827
October 2006
avatar

are you sure you need vectors

Vectors are the default and best choice for the vast majority of algorithms. You basically never want std::list.

"For in much wisdom is much grief: and he that increases knowledge increases sorrow."-Ecclesiastes 1:18
[SiegeLord's Abode][Codes]:[DAllegro5]:[RustAllegro]

Chris Katko
Member #1,881
January 2002
avatar

SiegeLord said:

Vectors are the default and best choice for the vast majority of algorithms.

Not any that involve any deleting, or any serious amounts of adding either. Every time you add past the implementation-dependant chunk size (say, for example, a multiple of 64 entries), it has to reallocate an entirely new array and leaves behind a hole in memory that can only fit something that size or smaller (vectors lead to memory fragmentation.)

-----sig:
“Programs should be written for people to read, and only incidentally for machines to execute.” - Structure and Interpretation of Computer Programs
"Political Correctness is fascism disguised as manners" --George Carlin

Edgar Reynaldo
Major Reynaldo
May 2007
avatar

Chris Katko
Member #1,881
January 2002
avatar

You can resize the vector ahead of time to prevent copying,

If your algorithm allows such a look ahead.

Quote:

and with vectors you generally use pointers and not objects so the objects themselves aren't copied, just their addresses.

That still involves copying gigantic arrays of pointers every time a new object is added (that exceeds the current reserve, which is often if you don't explicitly ask for it.) Because it's required to be contiguous, an entire new array must be allocated every time you exceed that boundary.

Google std::vector memory fragmentation. It's not like I'm pulling this out of my butt.

-----sig:
“Programs should be written for people to read, and only incidentally for machines to execute.” - Structure and Interpretation of Computer Programs
"Political Correctness is fascism disguised as manners" --George Carlin

SiegeLord
Member #7,827
October 2006
avatar

that exceeds the current reserve, which is often if you don't explicitly ask for it

Eh? Every implementation of std::vector I'm aware of increases the capacity exponentially.

#SelectExpand
1#include <vector> 2#include <stdio.h> 3 4int main() 5{ 6 std::vector<int> vec; 7 int old_capacity = vec.capacity(); 8 for(int ii = 0; ii < 1000000; ii++) 9 { 10 vec.push_back(ii); 11 if(vec.capacity() != old_capacity) 12 { 13 old_capacity = vec.capacity(); 14 printf("Reallocated. Capacity %d\n", old_capacity); 15 } 16 } 17}

Capacity changed 1
Capacity changed 2
Capacity changed 4
Capacity changed 8
Capacity changed 16
Capacity changed 32
Capacity changed 64
Capacity changed 128
Capacity changed 256
Capacity changed 512
Capacity changed 1024
Capacity changed 2048
Capacity changed 4096
Capacity changed 8192
Capacity changed 16384
Capacity changed 32768
Capacity changed 65536
Capacity changed 131072
Capacity changed 262144
Capacity changed 524288
Capacity changed 1048576

That's 21 reallocations for 1 million elements added... that doesn't count as often to me.

"For in much wisdom is much grief: and he that increases knowledge increases sorrow."-Ecclesiastes 1:18
[SiegeLord's Abode][Codes]:[DAllegro5]:[RustAllegro]

Gideon Weems
Member #3,925
October 2003

Bjarne Stroustrap agrees with SiegeLord.

video

Chris Katko
Member #1,881
January 2002
avatar

And Reddit appears to disagree with using his sensationalist title as a slogan. Moreover, the title was made up by the YouTube uploader, not him, so did you actually watch the video? But, back to the discussion:

Quote:

This benchmark teaches absolutely nothing new about caches or linked lists or arrays.

The algorithms Bjarne is benchmarking are both O(n) regardless of array or linked list, and it's not some kind of mind blowing result that arrays have smaller constant factors and hence will outperform a linked list for linear algorithms. This is honestly basic first year CS material.

Linked lists are typically used when you want to iterate over a container, and on every iteration you may mutate the container itself by either adding to it or deleting from it. In such a case the linked list will reduce the time complexity of an algorithm from being quadratic to being linear.

And the absolute key:

Quote:

A win over what? Is contiguous memory a win over using an algorithm with a smaller time complexity?

If you have a fast algorithm that uses deletes and inserts in the middle, as it runs, you pay no penalty for a "linear search" since you already have it*, and no penalty for resizing the vector because resizing isn't necessary.

To blanketly say vectors are always better than lists is pure fallacy.

[edit]

*I appear to be wrong in my misunderstanding of STL. STL uses uninstrusive linked lists which do still require a linear traversal to do a delete. Intrusive linked lists do not suffer that penalty.

In an algorithm that can support it: Since an unsorted vector can just "delete" an arbitrary element by copying the end element into the the deleted space and then popping off the last element in O(1) time, the overhead of traversing the list will most certainly be slower.

[edit] I keep seeing this... but it makes no sense. Can someone explain to me why if I have a pointer / iterator to a specific entry in linked list, I'd still need to perform a linear search?

-----sig:
“Programs should be written for people to read, and only incidentally for machines to execute.” - Structure and Interpretation of Computer Programs
"Political Correctness is fascism disguised as manners" --George Carlin

Slartibartfast
Member #8,789
June 2007
avatar

Can someone explain to me why if I have a pointer / iterator to a specific entry in linked list, I'd still need to perform a linear search?

You don't.
See list::erase. Complexity is linear in the number of elements deleted, i.e. constant for a single element. It works exactly as you expect it to.

torhu
Member #2,727
September 2002
avatar

Anyway, I ran SiegeLord's test program on VC 9 (2008):

Quote:

Reallocated. Capacity 1
Reallocated. Capacity 2
Reallocated. Capacity 3
Reallocated. Capacity 4
Reallocated. Capacity 6
Reallocated. Capacity 9
Reallocated. Capacity 13
Reallocated. Capacity 19
Reallocated. Capacity 28
Reallocated. Capacity 42
Reallocated. Capacity 63
Reallocated. Capacity 94
Reallocated. Capacity 141
Reallocated. Capacity 211
Reallocated. Capacity 316
Reallocated. Capacity 474
Reallocated. Capacity 711
Reallocated. Capacity 1066
Reallocated. Capacity 1599
Reallocated. Capacity 2398
Reallocated. Capacity 3597
Reallocated. Capacity 5395
Reallocated. Capacity 8092
Reallocated. Capacity 12138
Reallocated. Capacity 18207
Reallocated. Capacity 27310
Reallocated. Capacity 40965
Reallocated. Capacity 61447
Reallocated. Capacity 92170
Reallocated. Capacity 138255
Reallocated. Capacity 207382
Reallocated. Capacity 311073
Reallocated. Capacity 466609
Reallocated. Capacity 699913
Reallocated. Capacity 1049869

Can anyone see what the pattern is?

Chris Katko
Member #1,881
January 2002
avatar

Is that sarcastic, or serious? Because it's been awhile since I've done recurrence relations (?) and I honestly can't tell yet.

Mine in 64-bit Linux VM shows the same as SiegeLord.

#SelectExpand
1Reallocated. Capacity 1 2Reallocated. Capacity 2 3Reallocated. Capacity 4 4Reallocated. Capacity 8 5Reallocated. Capacity 16 6Reallocated. Capacity 32 7Reallocated. Capacity 64 8Reallocated. Capacity 128 9Reallocated. Capacity 256 10Reallocated. Capacity 512 11Reallocated. Capacity 1024 12Reallocated. Capacity 2048 13Reallocated. Capacity 4096 14Reallocated. Capacity 8192 15Reallocated. Capacity 16384 16Reallocated. Capacity 32768 17Reallocated. Capacity 65536 18Reallocated. Capacity 131072 19Reallocated. Capacity 262144 20Reallocated. Capacity 524288 21Reallocated. Capacity 1048576

The change in yours from each previous is:

Quote:

1
2 1
3 1
4 1
6 2
9 3
13 4
19 6
28 9
42 14
63 21
94 31
141 47
211 70
316 105
474 158
711 237
1066 355
1599 533
2398 799
3597 1199
5395 1798
8092 2697
12138 4046
18207 6069
27310 9103
40965 13655
61447 20482
92170 30723
138255 46085
207382 69127
311073 103691
466609 155536
699913 233304
1049869 349956

-----sig:
“Programs should be written for people to read, and only incidentally for machines to execute.” - Structure and Interpretation of Computer Programs
"Political Correctness is fascism disguised as manners" --George Carlin

torhu
Member #2,727
September 2002
avatar

Is that sarcastic, or serious? Because it's been awhile since I've done recurrence relations (?) and I honestly can't tell yet.

Don't worry, it's probably just some GCC and Microsoft developers conspiring to ruin your day ::)

SiegeLord
Member #7,827
October 2006
avatar

torhu said:

Can anyone see what the pattern is?

It's just multiplying the capacity by 1.5 rather than 2. 2 is the usual multiplier, while 1.5 is the conservative one. It has to be exponential, however, to be standard compliant.

"For in much wisdom is much grief: and he that increases knowledge increases sorrow."-Ecclesiastes 1:18
[SiegeLord's Abode][Codes]:[DAllegro5]:[RustAllegro]

torhu
Member #2,727
September 2002
avatar

So is it not standard compliant?

 1   2 


Go to: