Novel View Synthesis with Depth Reprojection

Detection, Video & Advanced Vision DS practice problem on Onlearn.

Difficulty: medium.

Topics: Novel View Synthesis with Depth Reprojection, Epipolar Constraints, Homography Matrix, Bilinear Interpolation, Depth Map Estimation, Camera Intrinsic Calibration, Computer Vision, Differential Geometry, Numerical Optimization, Signal Processing, Linear Algebra, Multi-View Geometry, Differentiable Rendering, Photogrammetry, Image Warping, Feature Correspondence.

Task: Novel View Synthesis via Depth Based Reprojection A core technique in 3D computer vision is synthesizing how a scene would look from a different camera viewpoint, given a single source image and its per pixel depth map. This is known as novel view synthesis via depth reprojection. The idea is: if you know the depth at every pixel, you can lift each pixel into 3D space, apply a rigid camera transformation, and project the 3D point onto a new image plane. Your Task Implement the function novel view synthesis(source image, depth map, K src, K tgt, R, t) that: 1. Takes a source image of shape (H, W, C), a depth map of shape (H, W), source and target camera intrinsic matrices K src and K tgt (both 3x3), a rotation matrix R (3x3), and a translation vector t (3,). 2. For each source pixel (u, v) with positive depth d: Computes the corresponding 3D point in the source camera frame Transforms that point into the target camera frame using R and t Projects it onto the target image plane using K tgt Maps the nearest integer pixel coordinate in the target image 3. Uses a depth buffer (z buffer) to handle occlusions: when multiple source pixels project to the same target pixel, the one with the smallest depth in the target frame should be kept. 4. Pixels with zero or negative depth should be skipped. Points that land behind the target camera (non positive z after transformation) should also be skipped. 5. Returns the synthesized target image of shape (H, W, C) as a float array initialized to zeros. Use int(round(...)) for converting projected floating point coordinates to integer pixel positions.