There have been a lot of crazy many-camera rigs created for the purpose of capturing full spatial video.  I recall a conversation at Meta that was basically “we are going to lean in as hard as possible on classic geometric computer vision before looking at machine learning algorithms”, and I was supportive of that direction. That was many years ago, when ML still felt like unpredictable alchemy, and of course you want to maximize your use of the ground truth! Hardcore engineering effort went into camera calibration, synchronization, and data processing, but  it never really delivered on the vision. No matter how many cameras you have, any complex moving object is going to have occluded areas, and “holes in reality” stand out starkly to a viewer not exactly at one of the camera points. Even when you have good visibility, the ambiguities in multi camera photogrammetry make things less precise than you would like. There were also some experiments to see how good you could make the 3D scene reconstruction from the Quest cameras using offline compute, and the answer was still “not very good”, with quite lumpy surfaces. Lots of 3D reconstructions look amazing scrolling by in the feed on your phone, but not so good blown up to a fully immersive VR rendering and put in contrast to a high quality traditional photo. You really need strong priors to drive the fitting problem and fill in coverage gaps. For architectural scenes, you can get some mileage out of simple planar priors, but modern generative AI is the ultimate prior. Even if the crazy camera rigs fully delivered on the promise, they still wouldn’t have enabled a good content ecosystem. YouTube wouldn’t have succeeded if every creator needed a RED Digital Cinema camera. The (quite good!) stereoscopic 3D photo generation in Quest Instagram is a baby step towards the future. There are paths to stereo video and 6DOF static, then eventually to 6DOF video. Make everything immersive, then allow bespoke tuning of immersive-aware media.
162.33K