Allegro.cc - Online Community

Allegro.cc Forums » Off-Topic Ordeals » 3D accel newbie

Credits go to HoHo and Steve Terry for helping out!
This thread is locked; no one can reply to it. rss feed Print
3D accel newbie
Arthur Kalliokoski
Second in Command
February 2005
avatar

I just got an old computer that happens to have 3d accelerated graphics. I've been googling a few times trying to get up to speed on OpenGL, alleggl, etc. but those seem to deal with toy examples to show some object rotating, waving whatever. I already have skimmed through the Redbook and Bluebook.

I guess what I'm looking for is a much more hardware oriented viewpoint for 3d acceleration. I want to find some web pages that answer questions like: does the 3d card have a floating point unit of its own? Or why certain constraints come into play, such as texture size limits?

Any good links?

They all watch too much MSNBC... they get ideas.

Steve Terry
Member #1,989
March 2002
avatar

3D cards of today are becoming more processor based than generalized like the 3D cards of a few years ago, hence the term GPU. They can now run generalized fragment programs and pixel shaders with excellent floating point throughput, in most cases faster than a CPU. Before long graphics cards will be a processor of their own and real multi-cpu systems will be available (multi-core/processor, and a generalized graphics processor, sound processor, etc). Well we are pretty much already there :P

___________________________________
[ Facebook ]
Microsoft is not the Borg collective. The Borg collective has got proper networking. - planetspace.de
Bill Gates is in fact Shawn Hargreaves' ßî+çh. - Gideon Weems

HoHo
Member #4,534
April 2004
avatar

Steve Terry said:

Before long graphics cards will be a processor of their own and real multi-cpu systems will be available (multi-core/processor, and a generalized graphics processor, sound processor, etc). Well we are pretty much already there :P

Actually situation is getting quite interesting. In the old times (and now too) most general purpose computing (e.g scene management) was done on CPU and rendering on GPU. I plan to reverse that: use a two dual core processor CPU machine for rendering and a powerful GPU for managing the scene data. Should be fun. Too bad it's only in planning stage right now, also without that monster PC it would be quite hard to implement well :P

__________
In theory, there is no difference between theory and practice. But, in practice, there is - Jan L.A. van de Snepscheut
MMORPG's...Many Men Online Role Playing Girls - Radagar
"Is Java REALLY slower? Does STL really bloat your exes? Find out with your friendly host, HoHo, and his benchmarking machine!" - Jakub Wasilewski

Arthur Kalliokoski
Second in Command
February 2005
avatar

I guess I was premature posting this anyway, after I left allegro.cc I wound up in Nvidia website that had a bit of info about ALU's etc.

They also had some SDK stuff with a screen shot that had individual blades of grass visible for 30-50 meters out, and some text blitted showing something like 100K polygons with 35 fps!

They all watch too much MSNBC... they get ideas.

Bob
Free Market Evangelist
September 2000
avatar

Quote:

does the 3d card have a floating point unit of its own? Or why certain constraints come into play, such as texture size limits?

If you have particular questions about GPUs, you're free to ask me.

Consummer 3D cards have had multiple floating-point units since the GeForce 256 (4 of them, in fact).

The GeForce 7800 GTX has, in comparison, 224 programmable floating-point units (FMAD), and a whole bunch more used for texture filtering, addressing and blending. There is also specialized circuits for interpolating attributes and computing transcendentals.

The texture size constraints come in mainly from precision constraints, due to chip area constrains. It's much more expensive to support larger texture sizes because of the adders and multipliers that are needed for addressing (and addressing textures is very complex!). You also incur a cost in terms of cache tag size.

--
- Bob
[ -- All my signature links are 404 -- ]

Arthur Kalliokoski
Second in Command
February 2005
avatar

Thanks, Bob!

Right now I'm googling for more GL tutorials, more like the Allegro docs than just a bunch of examples like the NeHe (which are very impressive). Just today I got gluLookAt to work because I hadn't noticed I'd always been using all zeros for the up vector.::) Looking through the old Pixelate things today also, have to look in them in depth when I get home again.

Searching Allegro forums for polls on memory etc don't seem to bring up anything about what video cards are capable of, and the few computer capability things I saw were several years old.

Could I assume that the usual video card of today has 64Mb video memory? Can do 1024 x 1024 textures? How many vertices fit into a GL_TRIANGLE_STRIP?

Thanks again :)

They all watch too much MSNBC... they get ideas.

Bob
Free Market Evangelist
September 2000
avatar

Quote:

Could I assume that the usual video card of today has 64Mb video memory?

Most likely. All low-end video cards for the last 2 years have had 128 MB of video memory accessible to them. So 64 MB cards are on cards older than that.

Quote:

Can do 1024 x 1024 textures?

Every GPU ever built, except the Voodoo 1 to 3.

Quote:

How many vertices fit into a GL_TRIANGLE_STRIP?

As many as you want. The data is streamed to the GPU, so there is no limit.

--
- Bob
[ -- All my signature links are 404 -- ]

Arthur Kalliokoski
Second in Command
February 2005
avatar

All right! Guess I have to figure it out and port a ton of software rendering stuff to make a demo to brag on! Nobody paid any attn to some island thing I hacked into Allegro software rendering thing last spring.

My mind still boggles over all that hardware in a video card, why don't we have the same on the mobo so we can have our own supercomputers? Or it's too specialized maybe?

[EDIT]
Checking out something called ROAM for terrain mapping, I been doing something like this from watching the terrain "pop up" as you get close in a 10 year old game called EF2000. ROAM is supposed to work very well with 3d acceleration. Also was thinking about (in software renderer) to run the game with flat shading, automatic movement of camera while running while I sleep, would color each polygon a unique color, and keep a database of which polygons were visible from certain areas. Would have to compress to run length bitfields, haven't ever seen this elsewhere.

They all watch too much MSNBC... they get ideas.

Bob
Free Market Evangelist
September 2000
avatar

Quote:

why don't we have the same on the mobo so we can have our own supercomputers? Or it's too specialized maybe?

GPUs cost too much for motherboards. A chipset's sell price (in bulk) is on the order of 20 to 30 USD. In that 30 USD, you need to fit the whole chipset. There is little room (cost-wise) to put a fancy GPU there.

We do see the odd integrated GPU every now and then, but they're not very powerful: typically, they're ~half the speed of the slowest discreet solution available at the time.

Quote:

ROAM is supposed to work very well with 3d acceleration

Unfortunately, it's not really hardware-friendly. You need to recreate your vertex and index buffers almost every frame, since you'll be generating a new polygon soup.

Plus, ROAM has that nasty feature of taking more time to compute the scene when your frame rate diminishes, which lowers your frame rate and causes more computations on the next frame, etc.

Quote:

Also was thinking about (in software renderer) to run the game with flat shading, automatic movement of camera while running while I sleep, would color each polygon a unique color, and keep a database of which polygons were visible from certain areas. Would have to compress to run length bitfields, haven't ever seen this elsewhere.

You're probably better off doing some form of BSP instead.

--
- Bob
[ -- All my signature links are 404 -- ]

Thomas Harte
Member #33
April 2000
avatar

Quote:

Unfortunately, it's not really hardware-friendly. You need to recreate your vertex and index buffers almost every frame, since you'll be generating a new polygon soup.

I don't see how this is more true of ROAM than any other system of adaptive meshing? I also take it there is not yet an analogue to pixel/vertex shaders that allow you to programmatically create a vertex list?

I've never owned a card with any sort of programmable functionality, so please excuse my ignorance.

EDIT: but I did do quite a lot in the realm of full software 3d before obtaining a 3dfx Voodoo, too many years ago for me to be willing to remember. So don't feel the need to patronise.

Quote:

Actually situation is getting quite interesting. In the old times (and now too) most general purpose computing (e.g scene management) was done on CPU and rendering on GPU. I plan to reverse that: use a two dual core processor CPU machine for rendering and a powerful GPU for managing the scene data.

I really don't see how this would be beneficial.

HoHo
Member #4,534
April 2004
avatar

Quote:

I really don't see how this would be beneficial.

GPU's are not very good for ray tracing, at least not for now. Their flexibility is practically nonexistant compared to CPU's. Space partitioning tree building takes a lot of FP power and probably can be given fot GPU to process so that would give more CPU time for CPU's to deal with rendering.

__________
In theory, there is no difference between theory and practice. But, in practice, there is - Jan L.A. van de Snepscheut
MMORPG's...Many Men Online Role Playing Girls - Radagar
"Is Java REALLY slower? Does STL really bloat your exes? Find out with your friendly host, HoHo, and his benchmarking machine!" - Jakub Wasilewski

Krzysztof Kluczek
Member #4,191
January 2004
avatar

Quote:

Space partitioning tree building takes a lot of FP power and probably can be given fot GPU to process so that would give more CPU time for CPU's to deal with rendering.

I don't think so. GPU is prepared to handle large number of short very specific task in vertex and pixel shaders. It works more or less like this:
- vertex shader processes every vertex (but vertexes are processed completely independently)
- rasterizer builds triangles and passes data to pixel shaders
- pixel shader processes every pixel and can lookup textures in the process (but again, every pixel is processed independently)
- then some fixed functions common to most renderings happen: alpha test, stencil/Z-buffer test and blending
- finally pixel is written to frame buffer

It probably is possible to write pixel shader, which will build the tree, but it will be highly redundant, because pixel shader has very limiting output (several floating point numbers - not enough to output too much of the tree). Also pixel and vertex shaders probably don't have enough temporary memory to build partitioning tree. GPUs are built to execute simple operations on loads of input data, but they aren't created to do complex task, unless the task is polygon rendering. :)

I think it should be probably easier to do space partitioning on CPU and do raytracing with pixel shaders on GPU. :)

HoHo
Member #4,534
April 2004
avatar

so much about the original topic ;D

I haven't really studied how could KD-tree building be done with GPU but I know there are some very efficient sorting algorithms working on GPU's.

Quote:

It probably is possible to write pixel shader, which will build the tree, but it will be highly redundant, because pixel shader has very limiting output (several floating point numbers - not enough to output too much of the tree).

In KD-tree for CPU, a single branch or leaf takes 32 bits. There are usually 1-3 triangles per leaf and tens of thousands to several millions triangles per scene.
Not too much I to start worrying about memory useage.

Quote:

I think it should be probably easier to do space partitioning on CPU and do raytracing with pixel shaders on GPU. :)

There have been attempts to do so. Probably the latest and most successful one is described here (check the thesis).
IIRC, in an earlier attempt a r9800xt was about as fast as p4 2.4-3GHz. I can't compare the implementation talked in the thesis paper but it seems like its not radically faster, on 6800u perhaps 5-10x faster than 3GHz P4. And that is just pure rendering, no scene creation. Usually it takes a long time compared to rendering.

Also as weird as it may sound using shaders in GPU based ray tracer will probably be quite complicated

Anyone interested in real time ray tracing check out OpenRT. You can register to get a noncommercial version for Linux there :)
If anyone gets the ICC version working tell me how you did it. You can read my adventures with it from the mailing list archive.
The GCC thingie should work pretty much out-of-box.

__________
In theory, there is no difference between theory and practice. But, in practice, there is - Jan L.A. van de Snepscheut
MMORPG's...Many Men Online Role Playing Girls - Radagar
"Is Java REALLY slower? Does STL really bloat your exes? Find out with your friendly host, HoHo, and his benchmarking machine!" - Jakub Wasilewski

Bob
Free Market Evangelist
September 2000
avatar

Quote:

I don't see how this is more true of ROAM than any other system of adaptive meshing?

Some adaptive meshing scehemes are better than others, but most of them are suboptimal; That is, there comes a time when NOT doing adaptive subdivision is faster than doing it, simply because GPUs become faster than CPUs, more quickly.

Quote:

I also take it there is not yet an analogue to pixel/vertex shaders that allow you to programmatically create a vertex list?

Not yet, but it should come soon enough. It's been one of the touted features of DX10.

Quote:

I know there are some very efficient sorting algorithms working on GPU's.

They're not all that efficient. See Purcell and al. Sorting doesn't scale with computations, it scales with memory bandwidth (which grows much slower).

--
- Bob
[ -- All my signature links are 404 -- ]

HoHo
Member #4,534
April 2004
avatar

Quote:

They're not all that efficient. See Turcell and al. Sorting doesn't scale with computations, it scales with memory bandwidth (which grows much slower).

Have you seen this?

__________
In theory, there is no difference between theory and practice. But, in practice, there is - Jan L.A. van de Snepscheut
MMORPG's...Many Men Online Role Playing Girls - Radagar
"Is Java REALLY slower? Does STL really bloat your exes? Find out with your friendly host, HoHo, and his benchmarking machine!" - Jakub Wasilewski

Bob
Free Market Evangelist
September 2000
avatar

Quote:

Have you seen this?

Sure, but that doesn't disprove what I said above ;) (btw, this paper was forwarded to me by Tim Purcell, a few weeks ago, who incidentally has his desk adjacent to mine).

--
- Bob
[ -- All my signature links are 404 -- ]

Arthur Kalliokoski
Second in Command
February 2005
avatar

Quote:

You're probably better off doing some form of BSP instead

I'm trying to do landscapes, which I can't see working very well with BSP trees. On a hilltop near a corner of the map, you'd have most polygons visible in a single frame. Working toward a 3d car racing game.

I've gotten GL to do a "landscape" of colored triangles (probably get textures in it in an hour or two), can't get display list to work right yet so I'm still doing it with a loop passing parms to gl functions. Still about 4x faster than my software renderer though. Although this is only a 466Mhz Celeron, and when I've got 100K colored triangles on screen at once I'm only getting 5-8 fps... Still got a lot to learn here.

They all watch too much MSNBC... they get ideas.

Bob
Free Market Evangelist
September 2000
avatar

Quote:

I've gotten GL to do a "landscape" of colored triangles (probably get textures in it in an hour or two), can't get display list to work right yet so I'm still doing it with a loop passing parms to gl functions. Still about 4x faster than my software renderer though. Although this is only a 466Mhz Celeron, and when I've got 100K colored triangles on screen at once I'm only getting 5-8 fps... Still got a lot to learn here.

You'll want to use Vertex Buffer Objects instead of display lists or immediate mode. That's if you care about performance ;)

--
- Bob
[ -- All my signature links are 404 -- ]

Arthur Kalliokoski
Second in Command
February 2005
avatar

This computer has an Intel 810 chipset, and the ARB thing in Alleggl examples says it (or the drivers) don't support the ARB extension.:-/
Thanks for all the info though!

They all watch too much MSNBC... they get ideas.

HoHo
Member #4,534
April 2004
avatar

Quote:

and when I've got 100K colored triangles on screen at once I'm only getting 5-8 fps..

I think that's normal considering that to my knowledge most q3a levels had less triangles in total :P

Perhaps if VBO's are not supported then perhaps EXT_vertex_array can help you a bit. Its ugly to use compared to VBO's but it should give some speed boost compared to direct rendering if CPU is holding you back.

[edit]

Quote:

You'll want to use Vertex Buffer Objects instead of display lists or immediate mode

Funny, I've always thought display lists are the most efficient things for static geometry. Are VBO's really faster for static stuff?

__________
In theory, there is no difference between theory and practice. But, in practice, there is - Jan L.A. van de Snepscheut
MMORPG's...Many Men Online Role Playing Girls - Radagar
"Is Java REALLY slower? Does STL really bloat your exes? Find out with your friendly host, HoHo, and his benchmarking machine!" - Jakub Wasilewski

Krzysztof Kluczek
Member #4,191
January 2004
avatar

Quote:

I think that's normal considering that to my knowledge most q3a levels had less triangles in total

I think that 200 000 OpenGL calls per frame (or more if he isn't using triangle strips) is more likely to slow it down. That's why vertex arrays were introduced.

Quote:

Perhaps if VBO's are not supported then perhaps EXT_vertex_array can help you a bit. Its ugly to use compared to VBO's but it should give some speed boost compared to direct rendering if CPU is holding you back.

EXT_vertex_array is part of OpenGL 1.1 core, which means it should work everywhere. Also vertex arrays have quite similar interface to VBOs, which allows creating intelligent vertex buffer class with nice interface capable of using vertex arrays when VBOs aren't supported.

Quote:

Are VBO's really faster for static stuff?

Yes and for dynamic, too. :) They just give you the way to put something directly to GPU memory. :)

Arthur Kalliokoski
Second in Command
February 2005
avatar

My windows dll's don't have a EXT_vertex_array, but they do have a GL_EXT_compiled_vertex_array...

I see the Alleggl stuff has the EXT_vertex_array, but I'm having way too much trouble with it on windows, my slackware distro compiled version wouldn't respond to any input except cntrl-alt-backspacing my way back to console, and I need to get some glut stuff to get it to compile on my old Mandrake to check it out on that.

The 100k polygons reference was comparing to the NVidea demo thing, so (ignoring furthur optimizations I'm missing) the NVidea computer is about 7 times faster.

The car racing game will "skip" some vertices to make larger polygons in the distance to cut the total down. "Mipbumping?" Somewhat ROAM like. The Allegro thing I put in off topic ordeals last spring had a crude version.

They all watch too much MSNBC... they get ideas.

Bob
Free Market Evangelist
September 2000
avatar

Quote:

My windows dll's don't have a EXT_vertex_array

EXT_vertex_array is part of OpenGL 1.1. So if you have OpenGL 1.1, you don't need to check for EXT_vertex_array: it's already available.

--
- Bob
[ -- All my signature links are 404 -- ]

Arthur Kalliokoski
Second in Command
February 2005
avatar

So much to check, so little time...

I just reran the old Allegro thing on this Celeron, it got 12-13 fps @ 320x240x16 at the corner of a 128x vertex array, the gl thing got 23 fps under the same conditions, zbuffering and backface culling didn't seem to slow it down. The allegro thing had prerendered lighting on the slopes (but there was some sort of fencepost error, but that'd be trivial to fix) and there was an occasional clipping error. Got to split up the texture map into small strips for gl, also Thomas Harte was saying to do that for the software thing anyway, it'd cache better too. I forget (already) what my own software stuff did on the AMD K6-2, was much faster than allegro but fog etc. would have slowed it down again. can't run it on this celeron because the stupid vesa implementation sucks so bad and scitech doesn't grok, but gl is better even if I don't get all the fancy stuff down. There, I admit it!
Gotta go now, maybe back Monday.

[EDIT]
Some sort of dll with x810 something or other had the GL_EXT_compiled_vertex_array, so I suppose the Intel drivers are incomplete. NO dll had the EXT_vertex_array.

They all watch too much MSNBC... they get ideas.

Bob
Free Market Evangelist
September 2000
avatar

Quote:

NO dll had the EXT_vertex_array.

Yes, that's expected. I don't think there are any platforms that expose plain old OpenGL 1.0 (where EXT_vertex_array is meaningful). Windows is perpetually stuck at GL 1.1, which does mean that EXT_vertex_array is part of the core. That is, you can use the functions (glVertexPointer and family) as you would normally do any other GL function.

--
- Bob
[ -- All my signature links are 404 -- ]

Go to: