- Online Community Forums » Off-Topic Ordeals » My favorite programmer, hands down.

This thread is locked; no one can reply to it. rss feed Print
My favorite programmer, hands down.
Chris Katko
Member #1,881
January 2002

Is Mike Acton, a proponent of Data-Oriented Design. DoD wants you to think about the problem down to the fundamental, cacheline level, scope, and think of programs in terms of DATA, not CODE. Code merely transforms the data, the data is what's important.

Code should be designed with caches and architectures in mind. There is no such thing as "generic software"--all software has a subset of target platforms. Nobody is writing code for Google datafarms AND Arduinos AND Apple II's. There is SOME subset of target platforms and within that there are assumptions that can then proliferate through the code.

Optimization is NOT an after-thought. "Premature optimization is the root of all evil" is a statement that's been abused to apply to places that it never was designed for. Fast programs should be designed with understanding of the data from the start, not as an after-thought. That doesn't mean you start with assembly. You START by understanding the fundamental problem down to the bare metal.

Everyone is taught to use classes wrong. Everyone is taught "Want a tree? Make a tree class." But what happens when you want a big tree? Inherit a big tree! Okay, now you want an invisible tree! Sorry, you need multiple inheritance now. What happens when you want a big, invisible, red tree? Enjoy your class explosions.

Classes are data structures. Data structures should be bundled with relevant data that the CPU is going to use. Whether a programmer can visualize it as a real-world object doesn't matter. The programmer isn't the one running the code.

Programmers are taught to treat every object as a SINGLE class, which ends up being horrific for computers. The second you have an *object with .execute() method, you've screwed yourself. Why? Because you've built the assumption that every object is completely unique, sitting by it self, and there's no possibility to run batches of objects together.

Another thing people do is make insanely generic "one class does all" data structures which are impossible to optimize (for a compiler or human). (See last link below.)

DoD teaches you to think (and understand!) of the most common use case, and optimize for that first. Per the OGRE notes below, how often is a single (geometric) plane used and sent around? Or are they almost always sent around in arrays of planes? Then tailor code for arrays of planes!

Here's an amazing talk on DoD which downright pissed off the CppCon folks because they love their abstractions:

A newer talk/interview:

"Adding something to the problem [like unnecessary abstraction], never makes the problem simpler."

"One of the difficult problems in programming... is naming. Because by naming something, you're introducing a problem space--your [unnecessary] mental model into the problem space."

"Trying to apply a 'real world' model to what you've called a 'garbage problem' is not better. [...] You picked a model that's better for your game based on testing, 3 months of hardwork, having a very clear input and output. The output is Tommy smiles. The answer is the stuff you came up with. It's not garbage. It's the right answer. It works. It doesn't matter that there's no name for it. It doesn't matter that it's not 'real'."

"Any game has key metrics whether or not you notice them. If you're movement speed is 10 meters/S, that defines how much assets you can physically stream from your hardware. If you design for 10, and then move up to 20 m/s, you've got to redo everything [art assets, level designs, code, etc] so you can fit in that thinner budget now. And the opposite problem, you go from 10 to 5, and now you're NOT using all of the graphics budget you could have and your game doesn't look competitive." (He also goes on to suggest that critical factors / metrics should be "locked down" where possible to prevent the entire pipeline from spending millions adapting to the new constraint. And understanding "the cost of changing my mind.")

"Then you're optimizing at the end, and any time I hear it suggested, I want to slap somebody."

"If you're going to write for an architecture [x86, x64, ARM], then read the associated CPU's manuals. Period. Otherwise, you're just guessing." [paraphrased]


Check out his complete roasting of the OGRE library which the devs then took to heart and improved their code big time.!593426709_ZX4pZ=


Also, here's a good introduction to Data-oriented Design (by someone else):


Here's a reddit post from a guy who said he benefited from the talks.

Of course, in typical Reddit fashion, there's always a top comment "explaining" that he's full of crap. I think I'll take the words of a proven technical director of a AAA studio over some whiny Redditor.

[edit] OH, I FORGOT Here's one of the best PDF slides on the subject. Goes into LOTS of details yet stays to the point.

Apparently with the Sony Cell processor you HAVE to use DoD or it'll run like piss so Sony put lots of investment into education. I just wish the video of the talk was online.

“Programs should be written for people to read, and only incidentally for machines to execute.” - Structure and Interpretation of Computer Programs

Gideon Weems
Member #3,925
October 2003

Optimization is NOT an after-thought.

Thank you for expressing this. I feel the same way. I also feel sometimes like I am in the minority, so it is nice to hear a big-time programmer support this idea. The most jaw-dropping part of the talk was during Q&A, when he, as the keynote speaker at a C++ convention, told an auditorium full of C++ programmers that he would rather use C99.

A few years ago, my code was completely different than it is now. I basically abstracted myself into a non-existent corner. I have used straight C ever since... Live and learn, I guess.

A few other ideas I enjoyed:

  • Helping the compiler as much as possible by preconditioning loops.

  • "Print it out and zip it."

  • Do not model code after the real world; model it after the actual data... Is this not self-evident, though? Speaking from personal experience, people may tend to do the opposite out of pure pleasure. (Writing "bad" abstract code can be fun.)

Member #5,034
September 2004

All I want is C with namespaces and templates. Since I can't have it, I still to C++ without classes. Losing named initialization in structs is a blow, but an acceptable one.

I disagree with Acton on templates. He opposes them because they kill compile times and recommends the C preprocessor as an alternative. Templates do hurt compile times, but you can actually debug them, as opposed to the preprocessor and they offer some increased flexibility.

The PS3 and XBox 360 both pretty much required data oriented design. They used in-order execution processors, so they weren't nearly as fault tolerant toward shit code.

Chris Katko
Member #1,881
January 2002

I think D actually fixes a lot of the problems he hates about C++, and is still suitable for system-level code. The biggest problem with D is really the lack of ecosystem (people and their projects!) which requires plenty of C and C++ to D wrappers--at which point your data loses all of D's features when they leave your code. That can be "not a problem" for some library calls, but it could be a serialization-esk problem for others.

D's module system has much faster compile times than C++. So much so that the old men on the C++ Committee are considering adding them to C++... a decade late.

And D's templates are WAY more intuitive and powerful, and can have good error messages. Not perfect error messages, but a thousand times better than the STL wall-of-text crap.

I'm working on a (internal) game framework in D right now.

[edit] Oh, one more thing. The most common issue people take with D is the garbage collector. There's nothing in the actual language that requires the use of GC. RAII is just fine. Pointers, malloc, and realloc (<-!!) are all just fine. The standard lib could be replaced with a non-GC standard lib (just like many AAA-studios do when they rip out STL and roll their own). The only reason it doesn't exist is that the community is too small to warrant "re-inventing the wheel" just to appease people that, likely, don't give a crap about D anyway.

“Programs should be written for people to read, and only incidentally for machines to execute.” - Structure and Interpretation of Computer Programs

Mark Oates
Member #1,146
March 2001

Yea, I'm also a fan of Mike Acton - follow him on twitter. I really like his theory and approach in general, which encourages no abstraction, and tries to get you to think about how the computer itself works, like a physical machine.

Member #7,827
October 2006

There's nothing in the actual language that requires the use of GC.

I invite you to try appending to a built in array, using built-in associative arrays and using stateful closures without the GC enabled. You're drinking the D kool-aid a little too strongly. D doesn't have the ecosystem and users for a reason.

"For in much wisdom is much grief: and he that increases knowledge increases sorrow."-Ecclesiastes 1:18
[SiegeLord's Abode][Codes]:[DAllegro5]:[RustAllegro]

Gideon Weems
Member #3,925
October 2003

He opposes them because they kill compile times and recommends the C preprocessor as an alternative.

He recommends dynamic generation of code, using a language with strong text manipulation capabilities. I use Python for this exact purpose. Acton's presentation mentions Perl (though does not explicitly mention how Perl is used).

Then again, perhaps I misunderstood.

I admire Mark for understanding both ends of the spectrum. I saw his code for one of the SpeedHacks. If I remember correctly, it had healthy levels of abstraction--and speaking from personal experience, I find casting real-world light on data-world problems helps me better understand them... so these aren't commandments. Just rules of thumb.

Edit: I've finally thought of a question that I would have asked during the Q&A session. I would have liked to hear more about Acton's aversion to Boolean variables (as opposed to bit fields)... Heck, I'd like to hear from anyone. If taken to the extreme, Booleans have no place in platform/data-centric development... so do you never use Booleans?

Chris Katko
Member #1,881
January 2002

It should be stressed that just because I'm a fan of the guy doesn't mean I do everything he says on every project. I find his alternative style a breathe of fresh air, and apply the techniques where applicable.

In my current project, my objects are so bloody complex and abstract (where scripts do the actual input), that so far, I'm just biting the bullet and keeping them treated individually so I can keep my head wrapped around them. Later on, I may try to switch things in to arrays of data. (Though I am using static arrays / pools, and D's template system allows me to build and adjust the size of these plain arrays very easily.)

On the other hand, I'm working on a message system and a "two phase" object execution system so that all objects can be executed completely (or mostly) independent of each other, and then their "effects" are queued up and applied after all scripts have run. So all objects can be split across as many CPU cores as you have, and the only things that have to be synced are what I call "bridges" between "domains" (a distinct set of objects and memory for a core). That's been pretty fun to design. Since my world can change, and object densities can move around, I'm still looking into my options for dynamically splitting the world into domains ala BSP/quadtrees, or something more context-aware.

SiegeLord said:

I invite you to try appending to a built in array, using built-in associative arrays and using stateful closures without the GC enabled. You're drinking the D kool-aid a little too strongly. D doesn't have the ecosystem and users for a reason.

Again, to be noted, I'm still learning the language. I'm not saying it's perfect. I'm saying the biggest problem is simply the lack of more users. And the reason for the lack of users? It's simple. It doesn't have a big company backing it (Rust/Mozilla. Go/Google. C#/Microsoft.) It isn't taught in most schools because it's a system language. It's not "cool" because it's not a new, crappy, hipster web language... because it's a system language. C and C++ aren't "cool" anymore but have plenty of pre-existing systems.

D is too humble of a language to catch most hipsters. It doesn't say on the webpage that "Everyone else sucks, and using our new Javascript framework is the greatest thing since the Macintosh." But hipsters are dying. So give it some time.

Here's an actual project, that stripped out all GC:

(As well as the linking article on Stack Overflow about GC-less D.)

Up to 300% improvement by replacing the GC and standard lib. That's surely a lot. But it's not orders-of-a-magnitude either. Which is when I would start making a Hard Decision (TM). 300% in 2012, is about equivalent to me starting a project today with a modern CPU.

(Another guy did a GC-less Phobos stdlib for games back in ... 2005.)

That was way back in 2012, and things have improved. (e.g. IIRc, LDC didn't even exist then.) He didn't say it was "impossible" or even complain at all in that article. (I'm sure it was a lot of work, but so is programming in general.) When I ran some memory intensive tests on raw data a year ago, I had about 200% slowdown over C++ with the same code--with zero compiler tweaks or D-specific workarounds.

Yeah, 200% is a lot. But for an indie developer, the time and effort to develop is also a critical factor--not just speed of execution. There's TONS of games that have come out that were written in C# or even freaking JAVA (Minecraft?!). And yet, Minecraft is one of the most popular games of all time. (Ran like piss on my Netbook when it first came out. But my netbook is in parts in a box, and Minecraft is still selling copies.)

I'm building my game in D as an real-world, full project-sized, experiment. And I'll have much more to say about building a game in D when it's done.

So my point with D is not that it's perfect. It's that it's viable. And yes, GC-less D is possible if you're willing to put the effort in and restrict your feature-set a little.

But that's a HUGE DIFFERENCE from trying to run say, C#, without a garbage collector, which is actually impossible.


If taken to the extreme, Booleans have no place in platform/data-centric development... so do you never use Booleans?

While I can't answer what you said, what blew me away in his OGRE slides (see my first post) is that he said, "Why can't a float be used as a boolean?" (instead of casting a rapidly used function from float back to bool before returning the condition value.)

That blew my mind. Floats CAN, and that takes out an unnecessary conversion. Oh sure, we're "taught" in college and by fellow programmers to FEAR floating point numbers. "Never compare a float!!! (except less than or greater than!)" and that "floats aren't accurate!". Yet, I ran a test (and a Google would also suffice) that an IEEE float will actually store ANY INTEGER up to 16,777,216 completely accurately. Nobody bothered to tell me that, yet it's a fact that can be relied on.

So some times, yeah, it'd be faster to leave a float as a float. But we're always taught to needlessly encapsulate things into black boxes, out of the fear as if "nobody could possibly understand these things in real situations, so we should just restrict ourselves to a subset of use cases."

And what's so refreshing about Mike Acton is he says the opposite. "Understand the architecture or you're not an engineer."

Which IS something taught in real engineering departments, like my Mechanical Engineering degree. We were never told, "Hey, physics is hard. So let's just ignore it and design things that are bigger, fatter, and more expensive than they need to be." (Enjoy getting fired!) But somehow, that mentality is pervasive in the programming community. Just like (taking it back to my first post) how Mike Acton wants to slap someone anytime they want to abuse the "Premature Optimization is the root of all evil." quote.

[edit 2]

WAIT, I remember ONE thing related to your boolean question. Bools represent DECISIONS. An object with one bool could actually be two separate object cases. Two bools, four.

So often, instead of using bools, they'll SPLIT into separate arrays and/or SORT in the same single array. Hence the "print the result and zip it." You want your BRANCH decision (for a one bool situation) to all occur ONCE. All bool=false objects at say, the top, and then a sudden switch to all the bool=true objects at the bottom. Or vice-versa. The point is that all the like objects are together so the branch prediction doesn't have to explode as three objects come in =true with their code, and then 1 object with =false which uses a different set of code, and then back and forth, each time killing the branch prediction and destroying the pipeline.

Split data where possible and treat them as independent arrays.

Sort data in the other situation, so that branches only occur at the change-over point.

I've seen that mentioned in a bunch of game programmer conferences... not just Mike Acton.

Now, I'm still not sure the best way to address objects that are FLIPPING state dynamically. You could "sort" it, but then, you're still branching in that sort. However, if the sort takes the "branch hit" once, and then the rest of your code references those objects multiple times (where they'd hit those branches many times) that may work... also, you could do a subset sorting algorithm with only "reduces branching" but not "eliminates" yet still improves the overall performance.

That is, even with a program that only reads a dataset ONCE per frame, you might be able to SORT some subset of that dataset to reduce the entropy a little bit and have the additional sorting cost of that reduction be less than the branching cost of an unsorted data set. So you "clean up" your data set a little bit each time.

Then again, that would still require the main program to HAVE those if(bool) statements, where as the splitting method removes them altogether. So like I said, I haven't figured out the best solution yet. But if they were all in the same array, reducing the entropy a little bit each frame, may work great when your dataset only increases entropy a little bit each frame. (That is, some objects change bool state, but not ALL. So where they WERE sorted, they're now "mostly sorted" and you're constantly reducing that state by a little bit with your sort algorithm.)

“Programs should be written for people to read, and only incidentally for machines to execute.” - Structure and Interpretation of Computer Programs

Gideon Weems
Member #3,925
October 2003

So often, instead of using bools, they'll SPLIT into separate arrays and/or SORT in the same single array.

Ahh, that makes sense.

Chris Katko
Member #1,881
January 2002

Hey Siege, about Closures, I've not really considered how they're handed in memory. However, I just randomly stumbled on a comment that references GC and closures:


As a fun fact, if you modify druntime's allocator to be malloc(), you can use free(delegate.ptr) to manually manage closures, though this takes a lot of care to know if it actually should be freed or not.

Interesting... not sure if it answers your question/concerns about GC-free closures.

What are you typically using closures for?

“Programs should be written for people to read, and only incidentally for machines to execute.” - Structure and Interpretation of Computer Programs

Member #4,603
May 2004

Here is a collection of nice resources regarding data-oriented design:

Go to: