Contents

## Semantic Structure from Motion Inferring 3D Structure from Motion Intuition Typical Approaches

Zovistoski, Bern, Managing Editor has reference to this Academic Journal, PHwiki organized this Journal Semantic Structure from Motion Paper by: Sid Yingze Bao in addition to Silvio Savarese Presentation by: Ian Lenz Inferring 3D With special hardware: Range sensor Stereo camera Without special hardware: Local features/graphical models (Make3D, etc) Structure from motion Structure from Motion Obtain 3D scene structure from multiple images from the same camera in different locations, poses Typically, camera location & pose treated as unknowns Track points across frames, infer camera pose & scene structure from correspondences

This Particular University is Related to this Particular Journal

Intuition Typical Approaches Fit model of 3D points + camera positions to 2D points Use point matches (e.g. SIFT, etc.) Use RANSAC or similar to fit models Often complicated pipeline Building Rome in a Day Semantic SfM

Semantic SfM Use semantic object labels to in as long as m SfM Use SfM to in as long as m semantic object labels Hopefully, improve results by modeling both together High-level Approach Maximum likelihood estimation Given: object detection probabilities at various poses, 2D point correspondences Model probability of observed images given inferred parameters Use Markov Chain Monte Carlo to maximize Model Parameters C: camera parameters Ck : parameters as long as camera k Ck = {Kk, Rk, Tk} K: internal camera parameters known R: camera rotation unknown T: camera translation – unknown

Model Parameters q: 2D points qki : ith point in camera k qk = {x, y, a}ki x, y : point location a : visual descriptor (SIFT, etc.) Known Model Parameters Q: 3D points Qs = (Xs, Ys, Zs) World frame coordinates Unknown u: Point correspondences uki = s if qki corresponds to Qs Known Ck Qs qki uki = s Model Parameters o: camera-space obstacle detections okj : jth obstacle detection in camera k okj = {x, y, w, h, , , c}kj x, y: 2D location w, h: bounding box size , : 3D pose c: class (car, person, keyboard, etc.) Known

Model Parameters O: 3D objects Ot = (X, Y, Z, , , c)t Similar to o except no bounding box, Z coord Unknown Likelihood Function Assumption: Points independent from objects Why Splits likelihood, makes inference easier Would require complicated model of object 3D appearance otherwise Camera parameters appear in both terms Point Term Compute by measuring agreement between predicted, actual measurements Compute predictions by projecting 3D-> cam Assume predicted, actual locations vary by Gaussian noise

Point Term (Alternative) Take qki in addition to qlj as matching points from cameras Ck in addition to Cl Determine epipolar line of qki w/r/t Cl sd Take as the distance from qlj to this line Consider appearance similarity: Object Term Also uses agreement Projection more difficult Recall: 3D object parameterized by XYZ coords, orientation, class 2D also has bounding box params Projecting 3D->2D object Location, pose easy using camera params For BB width, height: fk : camera focal length W, H: mapping from object bounding cube to bounding box learned by using ground truth 3D object bounding cubes in addition to corresponding observations using ML regressor

Object Probability Scale proportional to bounding box size Highly quantized pose, scale Stack maps as tensor, index based on pose, scale Tensor denoted as (Chi) Tensor index denoted as Object Term Probability of object observation proportional to the probability of not not seeing it in each image (yes a double negative) Why do it this way Occlusion -> probability of not seeing = 1 Doesnt affect likelihood term Estimation Have a model, now how do we maximize it Answer: Markov Chain Monte Carlo Estimate new params from current ones Accept depending on ratio of new/old prob Two questions remain: What are the initial parameters How do we update

Initialization Camera location/pose two approaches: Point-based: Use five-points solver to compute camera parameters from five corresponding points Scale ambiguous, so r in addition to omly pick several Object-based: Form possible object correspondences between frames, initialize cameras using these Initialization Object & point locations: Use estimated camera parameters (prev slide) Project points, objects from 2D->3D Merge objects which get mapped to similar locations Determine 2D-3D correspondences (u) Update Order: C, O, Q (updated versions: C, O, Q) Pick C with Gaussian probability around C Pick O to maximize Pr(oO,C) (within local area of O) Pick Q to maximize Pr(q,uQ,C) Unless alternative term was used

Algorithm Obtaining Results Intuition: MCMC visit probability proportional to probability function (what were trying to maximize) Cluster MCMC points using MeanShift Cluster with most corresponding samples wins Read out Q, O, C as average from cluster Results http://www.eecs.umich.edu/vision/projects/ssfm/index.html

Results vs. Bundler Cars Office Object Detection Results Runtime 20 minute runtime as long as 2 images Results not presented as long as more than 4 Bad scaling Code released, but 0.1 alpha vers Ran Bundler on 4 images, took < 3 minutes Questions

## Zovistoski, Bern Managing Editor

Zovistoski, Bern is from United States and they belong to New Vision of the Roman Catholic Diocese of Tucson, The and they are from Tucson, United States got related to this Particular Journal. and Zovistoski, Bern deal with the subjects like Catholicism

## Journal Ratings by Connecticut College

This Particular Journal got reviewed and rated by Connecticut College and short form of this particular Institution is US and gave this Journal an Excellent Rating.