3-D "movies", as well as depth buffer capture.

3-D "movies", as well as depth buffer capture.

Chris Katko

Member #1,881

January 2002

1) I just realized 3-D movies are encoded side-by-side or over-under, but both are just flat RGB images, sent to each eye. So there's no real depth values encoded. Are there any file formats (YouTube, a Nikon/whatever 3-D camera, Kinect) that do?

2) What about calculating depth information from these separate images? This seems impossible, but am I missing something? Is there some form of inverse transform function to get back to depth, with two assumed eye coordinates? Or is there a destructive step and too much missing information?

3) I can get depth information with a 3-D sensor such as the Kinect. That's simple enough for me, but I'm not aware of any file formats that encode depth buffer. Are the video formats for that?

4) Given an OpenGL context, is there any easy way to capture the depth buffer? Both for cases A) I'm writing the program and B) I'm running a 3rd-party program.

Basically, I want access to video files where the RGB is supplemented with a depth buffer. I'm sure I can do that with a Kinect, but I don't have one at the moment, and even then that restricts me to what I capture by hand--as opposed to a wealth of existing media.

p.s. It also just occurred to me that the side-by-side pre-rendered 3-D setup they use makes an assumption regarding eye distance that will not hold for a large amount of people, likely causing headaches.

-----sig:
“Programs should be written for people to read, and only incidentally for machines to execute.” - Structure and Interpretation of Computer Programs
"Political Correctness is fascism disguised as manners" --George Carlin

gnolam

Member #2,030

March 2002

Chris Katko said:

A) I'm writing the program

Trivial. Render the scene to an FBO with an attached depth buffer. Then do whatever it is that you want to do with said buffer.

Quote:

The assumption is "screw actual human eye distances, exaggerate the effect until we think it looks good".
The headaches that many people experience come from a focusing mismatch - the actual focus point (the flat screen) is not where your brain is telling you you should focus. Thus, eye strain.

--
Move to the Democratic People's Republic of Vivendi Universal (formerly known as Sweden) - officially democracy- and privacy-free since 2008-06-18!

Johan Halmén

Member #1,550

September 2001

1) & 2) Images are one thing. 3D models are another thing. 3D images are just two images of something. 3D images are not 3D models. Too little information. Although some clever logics could analyse one of the two images both pixel by pixel and contextually, then find same context in the other image and further find the nearest matching pixel, then calculate the angular difference and further the distance. But I don't think speaking of actual stored depth info in images would make much sense.

3) What depth info? Sure not pixel by pixel, but more like one distance value to a huge moving blob that is pretending to be Pete Sampras. I saw a kid doing a 3D scan of himself using the Kinect. But the software clearly required him to rotate before the camera. So the logics received a lot more info than just a stereo image containing two flat images.

Adding a depth channel to a RGB image just doesn't make sense. What on earth would that mean? That each pixel in a bitmap would have some distance to the viewer? You still need the info that the two images in a stereo pair gives you. You need to know what's behind one pixel, because the other eye might see it.

And I don't think the "wrong" eye distance causes headache. It just scales everything. If your eye distance is 55 mm and the Hobbit was filmed with a 3D camera with an objective distance of 70 mm, Bilbo just looks smaller. There are other things that cause headache. Head tilting causes your eyes to twist unnaturally while trying to keep the two images together. Badly corrected barrel effects at the edges do harm to you.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Years of thorough research have revealed that the red "x" that closes a window, really isn't red, but white on red background.

Years of thorough research have revealed that what people find beautiful about the Mandelbrot set is not the set itself, but all the rest.

Chris Katko

Member #1,881

January 2002

Johan Halmén said:

Adding a depth channel to a RGB image just doesn't make sense. What on earth would that mean?

It means exactly what it means. Distance to the camera.

{"name":"kyle_kinect.jpg","src":"\/\/djungxnpq2nug.cloudfront.net\/image\/cache\/b\/5\/b5ba83fcec78577045f40f5f084660d2.jpg","w":640,"h":391,"tn":"\/\/djungxnpq2nug.cloudfront.net\/image\/cache\/b\/5\/b5ba83fcec78577045f40f5f084660d2"}

m c

Member #5,337

December 2004

But it'd still look 2d from the exact view of the camera.

It isn't until you rotate it that you will see the 3d, but then you will also see the gaps.

you need the 2d projection from 2 different points of view, and then feed each one into each eye preferably.

Attaching a depth buffer would only help with animated lighting tricks like a bump mapping that varies over multiple frames of a still 2d+depth buffer image to highlight the 3d depth contours. Something like that has probably been done in demos.

Also that would help in snipping a subject out of a photo, because you could select by depth instead of just by colour, so you wouldn't need a green screen you could just stand apart from other objects.

And it would be easier to adjust the lighting of the person to match the lighting of the cg background because with depth information you know where to put highlights and shadows, so you could have cheaper + easier + more immersive green screen replacement.

That is the only use of a 2d camera that also gives depth buffer.

Other wise you want 2x 2d camera side-by-side, just like you have 2 microphone side-by-side to give immersive sound stage.

And who cares about scanning 3d models out of video footage, you can do that with 1x 2d camera with no depth buffer, just take a series of shots around every angle. Do you work for the NSA big data compression where they want to turn petabytes of CCTV footage into gigabytes of blender project files or something? Oh I know you want to 3d film a girl next door and then make blender porn

(\ /)
(O.o)
(> <)

Chris Katko

Member #1,881

January 2002

m c said:

But it'd still look 2d from the exact view of the camera.

I am not trying to get anything that matters to a human. I'm trying to get distance data.

Quote:

Do you work for the NSA big data compression where they want to turn petabytes of CCTV footage into gigabytes of blender project files or something? Oh I know you want to 3d film a girl next door and then make blender porn

I did a ~6 months of a Ph.D regarding robotics, LIDAR, and RGBD systems before leaving for health reasons.

During that time, we did lots of investigation into the current progress of RGBD cameras for the purpose of localization and mapping.

So all that said, I'm aware of what can be done. I'm just looking for data for some purely software ideas right now.

[edit] Come to think of it, compiled RGBD datasets of objects exist, it stands to reason that people likely publish unconverted RGBD streams.

For one example:

http://www0.cs.ucl.ac.uk/staff/M.Firman/RGBDdatasets/

And this project:

http://www.rgbdtoolkit.com/projects.html

But it sure would have been nice to get some access to 3-D entertainment media.

m c

Member #5,337

December 2004

Use the drone to make blender porn of everyone down the street.

Wouldn't triangulating from side-by-side back to 3d give mirrored depths on either side of the focal point?

(\ /)
(O.o)
(> <)

Ben Delacob

Member #6,141

August 2005

There are a number of projects that extract 3D scenes from 2d images, though I don't know of any readily available video with a depth buffer. Try searching for "3d reconstruction", perhaps narrowed down with "video" or "multiple images". There are even ones like this from Cornell that use a single image and make guesses at the depth.

__________________________________
Allegro html mockup code thread -website-
"two to the fighting eighth power"

Johan Halmén

Member #1,550

September 2001

Chris Katko said:

It means exactly what it means. Distance to the camera.

Ok, I get it. It's just hard to not think of 3D-images as soon as someone is talking about depth info in images. Even this image of yours:

{"name":"b5ba83fcec78577045f40f5f084660d2.jpg","src":"\/\/djungxnpq2nug.cloudfront.net\/image\/cache\/b\/5\/b5ba83fcec78577045f40f5f084660d2.jpg","w":640,"h":391,"tn":"\/\/djungxnpq2nug.cloudfront.net\/image\/cache\/b\/5\/b5ba83fcec78577045f40f5f084660d2"}

...is kind of a 3D renderition of a 2D-image with depth info, right? Or a 2D renderition of a 3D object created of a 2D-image with depth info. Looking at the picture, it makes perfect sense. The light source seems to be the perspective point of the original image. The shadow of the arm on the background is what is hidden in the original image and we have no info about what there could be.

Ben Delacob said:

There are even ones like this from Cornell that use a single image and make guesses at the depth.

"Guess" is the keyword, even if you use a stereo pair of images. You need a clever pattern recognition algo. If you have a stereo pair of images and you look at them through a stereoscope, and you can percept the 3D image, there should be a way to do the pixel match through an algo, too. The more complex the images are, the more difficult would it be to find the match, but on the other hand, the more difficult would it be for your eyes, too, to percept the 3D image.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Years of thorough research have revealed that the red "x" that closes a window, really isn't red, but white on red background.

Years of thorough research have revealed that what people find beautiful about the Mandelbrot set is not the set itself, but all the rest.

Chris Katko

Member #1,881

January 2002

To clarify, if you looked at that picture straight on, it would be exactly what you're thinking of. It has been turned into a point cloud and rotated to show shadows and depth more clearly:

{"name":"kylekinect.jpg","src":"\/\/djungxnpq2nug.cloudfront.net\/image\/cache\/4\/8\/4812ac926edadcdb8c487e6c90a89a7b.jpg","w":1024,"h":780,"tn":"\/\/djungxnpq2nug.cloudfront.net\/image\/cache\/4\/8\/4812ac926edadcdb8c487e6c90a89a7b"}

Here's what an unhacked Kinect actually gives you

{"name":"depthImage.jpg","src":"\/\/djungxnpq2nug.cloudfront.net\/image\/cache\/c\/0\/c07b7521fb62678bd5de7b7997c938a8.jpg","w":324,"h":254,"tn":"\/\/djungxnpq2nug.cloudfront.net\/image\/cache\/c\/0\/c07b7521fb62678bd5de7b7997c938a8"}

It automatically computes zones. But I believe you can run them in uncorrected depth-mode, more akin to this:

{"name":"DepthImages.gif","src":"\/\/djungxnpq2nug.cloudfront.net\/image\/cache\/5\/5\/558b39c521b2511475c9b0c61c95a37d.gif","w":363,"h":306,"tn":"\/\/djungxnpq2nug.cloudfront.net\/image\/cache\/5\/5\/558b39c521b2511475c9b0c61c95a37d"}

Ideally, I want a video source with this in the data stream:

{"name":"depthbuffer.jpg","src":"\/\/djungxnpq2nug.cloudfront.net\/image\/cache\/3\/a\/3a9663a4893847baa1c7fd6174065289.jpg","w":1024,"h":482,"tn":"\/\/djungxnpq2nug.cloudfront.net\/image\/cache\/3\/a\/3a9663a4893847baa1c7fd6174065289"}

Erin Maus

Member #7,537

July 2006

I may have just what you need to capture an OpenGL depth buffer that you normally don't have access to .

I've been working on a hook API used for a very similar purpose. I was curious about the model and scene data in a game I play and some of it is stored in some uber-propriety-unknown format, while the scene stuff is generated by a server (it's online). Rather than reverse engineer that crap (no one has time for that!) I wrote a pretty decent hook API for Windows and used it to dump the mesh data on certain frames (aka when I press a button). In the end, I want to create high quality versions of scenes, tweaking the data, etc, in order to create a virtual diorama.

So far what the hook does is pretty specific for the game I play and it's not the most efficient bunch of code, but the hook API is extremely efficient and effective. Just for fun I analyzed the game and figured out how to dump the depth buffer:

{"name":"609209","src":"\/\/djungxnpq2nug.cloudfront.net\/image\/cache\/c\/e\/ce76a1c7e3228a8fe3d7389f64785397.png","w":1039,"h":904,"tn":"\/\/djungxnpq2nug.cloudfront.net\/image\/cache\/c\/e\/ce76a1c7e3228a8fe3d7389f64785397"}

Each game manages the depth buffer differently in a sense, however. For example, the game in the above screenshot renders into a framebuffer, and at some point copies the color data into another framebuffer and then discards the depth buffer contents. With multisampling enabled, the framebuffer uses a multisampled depth buffer stored in a renderbuffer object, and thus using something like glReadPixels fails--I had to create my own framebuffer and use glBlitFramebuffer, and then read the depth data.

So always keep in mind, regardless of which method you do, you need to know how the game renders. Use something like apitrace to determine when and how you should capture the depth buffer.

Although it's not too hard to develop a Windows hook and intercept graphics data, I can provide the source if it interests you. I need a reason to upload it to GitHub anyway :p.

---
ItsyRealm, a quirky 2D/3D RPG where you fight, skill, and explore in a medieval world with horrors unimaginable.
they / she

Chris Katko

Member #1,881

January 2002

Aaron Bolyard said:

Although it's not too hard to develop a Windows hook and intercept graphics data, I can provide the source if it interests you. I need a reason to upload it to GitHub anyway :p.

I'd definitely like to see that, at least for reference! Did you base this on some other available tools or articles, or just come up with it on your own?

Erin Maus

Member #7,537

July 2006

Chris Katko said:

I'd definitely like to see that, at least for reference! Did you base this on some other available tools or articles, or just come up with it on your own?

I used some documentation on MSDN. Initially, I wrote this library a couple years back now, so I have no idea where to start looking for the article. The original code was functional and documented, but not the most user-friendly. I refactored it in the past month.

Here's the Github repo: https://github.com/aaronbolyard/capn

To read the depth buffer from the currently bound framebuffer (and only if the framebuffer is not multisampled), you'd do something this:

void GetDepthBuffer(GLint width, GLint height, GLfloat* values)
{
  glReadPixels(0, 0, width, height, GL_DEPTH_COMPONENT, GL_FLOAT, values);
}

Easy, right? Well, using those depth values is another thing. I'm not sure how you want them, but the values in the depth buffer are not linear. You need to know the near and far planes, and then you can convert each depth fragment into a position in world space:

GLfloat linearDepthFragment = (2 * near) / (far + near - depthFragment * (far - near));

You can get the near/far planes from a projection matrix, if you don't have any access to these values otherwise. I don't remember the formula off hand, nor do I have it written down, but it's rather simple (it's the idea of frustum culling). Search for "extract planes from projection matrix" or something, I guess.

---
ItsyRealm, a quirky 2D/3D RPG where you fight, skill, and explore in a medieval world with horrors unimaginable.
they / she