I am in the process of collecting all my structure from motion data sets on
this page.
Right now there are not that many but more will appear when I have time to put them there.

One of the key challenges for structure from motion systems in order to make them robust to failure is the ability to handle outliers among the correspondences. In this paper we present two new algorithms that find the optimal solution in the presence of outliers when the camera undergoes a pure translation. The first algorithm has polynomial-time computational complexity, independently of the amount of outliers. The second algorithm does not offer such a theoretical complexity guarantee, but we demonstrate that it is magnitudes faster in practice. No random sampling approaches such as RANSAC are guaranteed to find an optimal solution, while our two methods do. We evaluate and compare the algorithms both on synthetic and real experiments. We also embed the algorithms in a larger system, where we optimize for the rotation angle as well (the rotation axis is measured by other means). The experiments show that for problems with a large amount of outliers, the RANSAC estimates may deteriorate compared to our optimal methods.

Low rank approximation is an important tool in many applications. Given an observed matrix with elements corrupted by Gaussian noise it is possible to find the best approximating matrix of a given rank through singular value decomposition. However, due to the non-convexity of the formulation it is not possible to incorporate any additional knowledge of the sought matrix without resorting to heuristic optimization techniques. In this paper we propose a convex formulation that is more flexible in that it can be combined with any other convex constraints and penalty functions. The formulation uses the so called convex envelope, which is the provably best possible convex relaxation. We show that for a general class of problems the envelope can be efficiently computed and may in some cases even have a closed form expression. We test the algorithm on a number of real and synthetic data sets and show state-of-the-art results.

In computer vision, many problems can be formulated as finding a low rank approximation of a given matrix. Ideally, if all elements of the measurement matrix are available, this is easily solved in the L2-norm using factorization. However, in practice this is rarely the case. Lately, this problem has been addressed using different approaches, one is to replace the rank term by the convex nuclear norm, another is to derive the convex envelope of the rank term plus a data term. In the latter case, matrices are divided into sub-matrices and the envelope is computed for each sub-block individually. In this paper a new convex envelope is derived which takes all sub-matrices into account simultaneously. This leads to a simpler formulation, using only one parameter to control the trade-of between rank and data fit, for applications where one seeks low rank approximations of multiple matrices with the same rank. We show in this paper how our general framework can be used for manifold denoising of several images at once, as well as just denoising one image. Experimental comparisons show that our method achieves results similar to state-of-the-art approaches while being applicable for other problems such as linear shape model estimation.

Previous work on estimating the epipolar geometry of two views relies on being able to reliably match feature points based on appearance. In this paper, we go one step further and show that it is feasible to compute both the epipolar geometry and the correspondences at the same time based on geometry only. We do this in a globally optimal manner. Our approach is based on an efficient branch and bound technique in combination with bipartite matching to solve the correspondence problem. We rely on several recent works to obtain good bounding functions to battle the combinatorial explosion of possible matchings. It is experimentally demonstrated that more difficult cases can be handled and that more inlier correspondences can be obtained by being less restrictive in the matching phase.

This paper presents a simple and effective way of solving the robust subspace estimation problem where the corruptions are column-wise. The method we present can handle a large class of robust loss functions and is simple to implement. It is based on Iteratively Reweighted Least Squares (IRLS) and works in an iterative manner by solving a weighted least-squares rank-constrained problem in every iteration. By considering the special case of column-wise loss functions, we show that each such surrogate problem admits a closed form solution.
Unlike many other approaches to subspace estimation, we make no relaxation of the low-rank constraint and our method is guaranteed to produce a subspace estimate with the correct dimension.
Subspace estimation is a core problem for several applications in computer vision. We empirically demonstrate the performance of our method and compare it to several other techniques for subspace estimation. Experimental results are given for both synthetic and real image data including the following applications: linear shape basis estimation, plane fitting and non-rigid structure from motion.

Many standard optimization methods for segmentation and reconstruction compute ML model estimates for appearance or geometry of segments, e.g. Zhu-Yuille 1996, Torr 1998, Chan-Vese 2001, GrabCut 2004, Delong et al. 2012. We observe that the standard likelihood term in these formulations corresponds to a generalized probabilistic K-means energy. In learning it is well known that this energy has a strong bias to clusters of equal size, which can be expressed as a penalty for KL divergence from a uniform distribution of cardinalities. However, this volumetric bias has been mostly ignored in computer vision. We demonstrate significant artifacts in standard segmentation and reconstruction methods due to this bias. Moreover, we propose binary and multi-label optimization techniques that either (a) remove this bias or (b) replace it by a KL divergence term for any given target volume distribution. Our general ideas apply to many continuous or discrete energy formulations in segmentation, stereo, and other reconstruction problems.

Outliers pose a problem in all real structure from motion systems. Due to the use of automatic matching methods one has to expect that a (sometimes very large) portion of the detected correspondences can be incorrect. In this paper we propose a method that estimates the relative translation between two cameras and simultaneously maximizes the number of inlier correspondences. Traditionally, outlier removal tasks have been addressed using RANSAC approaches. However, these are random in nature and offer no guarantees of finding a good solution. If the amount of mismatches is large, the approach becomes costly because of the need to evaluate a large number of random samples. In contrast, our approach is based on the branch and bound methodology which guarantees that an optimal solution will be found. While most optimal methods trade speed for optimality, the proposed algorithm has competitive running times on problem sizes well beyond what is common in practice. Experiments on both real and synthetic data show that the method outperforms state-of-the-art alternatives, including RANSAC, in terms of solution quality. In addition, the approach is shown to be faster than RANSAC in settings with a large amount of outliers.

In this paper we consider the classical problem of finding a low rank approximation of
a given matrix. In a least squares sense a closed form solution is available via factorization. However, with additional constraints, or in the presence of missing data, the problem becomes much more difficult.
In this paper we show how to efficiently compute the convex envelopes of a class of rank minimization formulations. This opens up the possibility of adding additional convex constraints and functions to the minimization problem resulting in strong convex relaxations. We evaluate the framework on both real and synthetic data sets and demonstrate state-of-the-art performance.

The problem of finding a low rank approximation of a given measurement matrix is of key interest in computer vision. If all the elements of the measurement matrix are available, the problem can be solved using factorization. However, in the case of missing data no satisfactory solution exists. Recent approaches replace the rank term with the weaker (but convex) nuclear norm. In this paper we show that this heuristic works poorly on problems where the locations of the missing entries are highly correlated and structured which is a common situation in many applications. Our main contribution is the derivation of a much stronger convex relaxation that takes into account not only the rank function but also the data. We propose an algorithm which uses this relaxation to solve the rank approximation problem on matrices where the given measurements can be organized into overlapping blocks without missing data. The algorithm is computationally efficient and we have applied it to several classical problems including structure from motion and linear shape basis estimation. We demonstrate on both real and synthetic data that it outperforms state-of-the-art alternatives.

Stereo matching is an inherently difficult problem due to ambiguous and noisy texture. The non-convexity and non- differentiability makes local linear (or quadratic) approximations poor, thereby preventing the use of standard local descent methods. Therefore recent methods are predominantly based on discretization and/or random sampling of some class of approximating surfaces (e.g. planes). While these methods are very efficient in generating a rough surface estimate, via either fusion of proposals or label propagation, the end result is usually not as smooth as desired. In this paper we show that, if the objective function is decomposed correctly, local refinement of candidate solutions can be performed using an ADMM approach. This allows searching over more general function classes, thereby resulting in visually more appealing smooth surface estimations.

One of the major research areas in computer vision is scene reconstruction from image streams. The advent of RGB-D cameras, such as the Microsoft Kinect, has lead to new possibilities for performing accurate and dense 3D reconstruction. There are already well-working algorithms to acquire 3D models from depth sensors, both for large and small scale scenes. However, these methods often break down when the scene geometry is not so informative, for example, in the case of planar surfaces. Similarly, standard image-based methods fail for texture-less scenes. We combine both color and depth measurements from an RGB-D sensor to simultaneously reconstruct both the camera motion and the scene geometry in a robust manner. Experiments on real data show that we can accurately reconstruct large-scale 3D scenes despite many planar surfaces.

Energies with high-order non-submodular interactions have been shown to be very useful in vision due to their high modeling power. Optimization of such energies, however, is generally NP-hard. A naive approach that works for small problem instances is exhaustive search, that is, enumera- tion of all possible labelings of the underlying graph. We propose a general minimization approach for large graphs based on enumeration of labelings of certain small patches. This partial enumeration technique reduces complex high- order energy formulations to pairwise Constraint Satisfac- tion Problems with unary costs (uCSP), which can be ef- ficiently solved using standard methods like TRW-S. Our approach outperforms a number of existing state-of-the-art algorithms on well known difficult problems (e.g. curvature regularization, stereo, deconvolution); it gives near global minimum and better speed. Our main application of interest is curvature regularization.
In the context of segmentation, our partial enu- meration technique allows to evaluate curvature directly on small patches using a novel integral geometry approach.

In this paper we extend the 3D-label approach and demonstrate how to simultaneously fuse more than two proposals
at the same time for a 2nd order stereo regularizer. The optimization is
made possible by effectively computing a generalized distance transform. This
allows for computation of messages in linear time in the number of proposals, similar to the 1D-label case.
In addition the approach provides a lower bound on the globally optimal solution of
the multi-fusion problem. We verify experimentally that the lower bound is very
close to the computed solution, thus providing a near optimal solution.

It is commonly believed that higher order smoothness should be modeled using higher order interactions.
For example, 2nd order derivatives for deformable (active) contours are represented by triple cliques.
Similarly, the 2nd order regularization methods in stereo predominantly use MRF models with scalar (1D) disparity labels
and triple clique interactions. In this paper we advocate a largely overlooked alternative approach to stereo where 2nd order surface smoothness is represented by pairwise interactions with 3D-labels, e.g. tangent planes.
This general paradigm has been criticized due to perceived computational complexity
of optimization in higher-dimensional label space. Contrary to popular beliefs, we demonstrate
that representing 2nd order surface smoothness with 3D labels leads to simpler optimization
problems with (nearly) submodular pairwise interactions. Our theoretical and experimental
results demonstrate advantages over state-of-the-art methods for 2nd order smoothness stereo.

Multiple rotation averaging is an important problem in computer vision. The problem is challenging because of the nonlinear constraints required to represent the set of rotations. To our knowledge no one has proposed any globally optimal solution for the case of simultaneous updates of the rotations. In this paper we propose a simple procedure based on Lagrangian duality that can be used to verify global optimality of a local solution, by solving a linear system of equations. We show experimentally on real and synthetic data that unless the noise levels are extremely high this procedure always generates the globally optimal solution.

We consider the least-squares (L2) minimization problems in multiple view geometry for triangulation, homography, camera resectioning and structure-and-motion with known rotatation, or known plane. Although optimal algorithms have been given for these problems under an Linfinity cost function, finding optimal least-squares solutions to these problems is difficult, since the cost functions are not convex, and in the worst case may have multiple minima. Iterative methods can be used to find a good solution, but this may be a local minimum. This paper provides a method for verifying whether a local-minimum solution is globally optimal, by providing a simple and rapid test involving the Hessian of the cost function. The basic idea is that by showing that the cost function is convex in a restricted but large enough neighbourhood, a sufficient condition for global optimality is obtained.
The method is tested on numerous problem instances of real data sets. In the vast majority of cases we are able to verify that the solutions are optimal, in particular, for small to medium-scale problems.

Many problems in computer vision can be turned into a large-scale boolean optimization problem, which is in general NP-hard. In this paper, we further develop one of the most successful approaches, namely roof duality, for approximately solving such problems for higher-order models. Two new methods that can be applied independently or in combination are investigated. The first one is based on constructing relaxations using generators of the submodular function cone. In the second method, it is shown that the roof dual bound can be applied in an iterated way in order to obtain a tighter relaxation. We also provide experimental results that demonstrate better performance with respect to the state-of-the-art, both in terms of improved bounds and the number of optimally assigned variables.

Geometric reconstruction from image collections is a classical computer vision problem. The problem essentially consists of two steps; First, the identification of matches and assembling of point tracks, and second, multiple view geometry computations. In this paper we address the problem of constructing point tracks using graph theoretical algorithms. From standard descriptor matches between all pairs of images we construct a graph representing all image points and all possible matches. Using Gomory-Hu trees we make cuts in the graph to construct the individual point tracks. We present both theoretical and experimental results (on real datasets) that clearly demonstrates the benefits of using our approach.

Here we propose an energy-based framework for approximating surfaces from a cloud of point measurements corrupted by noise and outliers. Our energy assigns a tangent plane to each (noisy) data point by minimizing the squared distances to the points and the irregularity of the surface implicitly defined by the tangent planes. In order to avoid the well-known "shrinking" bias associated with first-order surface regularization, we choose a robust smoothing term that approximates curvature of the underlying surface. In contrast to a number of recent publications estimating curvature using discrete (e.g. binary) labellings with triple-cliques we use higher-dimensional labels that allows modeling curvature with only pair-wise interactions. Hence, many standard optimization algorithms (e.g. message passing, graph cut, etc) can minimize the proposed curvature-based regularization functional. The accuracy of our approach for representing curvature is demonstrated by theoretical and empirical results on synthetic and real data sets from multi-view reconstruction and stereo.

This paper points out a few benefits of using a non-sequential approach to structure from motion.
Prior work on multi-view structure from motion is dominated by sequential approaches starting from a single two-view reconstruction, then adding new images one by one. In contrast, we propose a non-sequential methodology based on rotational consistency and robust estimation using convex optimization. The resulting system is more robust with respect to (i) unreliable two-view estimations caused by short baselines, (ii) repetitive scenes with locally consistent structures that are not consistent with the global geometry and (iii) loop closing as errors are not propagated in a sequential manner. Both theoretical justifications and experimental comparisons are given to support these claims.

This paper presents a non-incremental system for structure and motion.
Our solution is based on robustly computing global rotations from relative geometries and
feeding these into the known-rotation framework to create an initial solution for bundle adjustment.
To increase robustness we present a new method for constructing reliable point tracks from pairwise matches.
We show that our method can be seen as maximizing the reliability of a point track if the
quality of the weakest link in the track is used to evaluate reliability.
To estimate the final geometry we alternate between bundle adjustment and a robust version of the known-rotation formulation.
The ability to compute both structure and camera translations independent of initialization makes our algorithm insensitive
to degenerate epipolar geometries.

Triangulating a Plane
Olsson, Carl; Eriksson, Anders,
Scandinavian Conference on Image Analysis, 2011.

This paper investigates triangulation of a scene plane from two images.
It shows that when minimizaing back-projection error of the plane induced homography,
this problem is quasiconvex and can be solved using standard methods.

This report points out a few benefits of using a non-sequential approach to structure from motion.
In particular we show that non-sequential approaches are insensitive to geometries with small baselines.
This is in contrast to sequential methods that in each step rely on a well estimated structure to be able
to add new cameras.

This paper extends the L-infinity framework to problems in 1-dimmensional vision when using angular errors.
We show that a number of 1D geometry problems are quasiconvex.
Using these results we also propose a branch and bound method for the full (bi-quasiconvex) structure and motion problem for 1D-vision.
Furthermore, we show that due to limited field of view the problem is often ill posed, and there exist multiple good
(in terms of reprojection error) local minima.
The paper builds on the preliminary results of this one.

Recent work on geometric vision problems has exploited convexity properties in order to obtain globally optimal solutions.
In this paper we give an overview of these developments and show the tight connections between different types of convexity
and optimality conditions for a large class of multiview geometry problems.
We also show how the convexity properties are closely linked to different types of optimization algorithms for computing the solutions.
This paper is based on some of the preliminary results in this one and this one.

Outlier Removal using Duality
Olsson, Carl; Eriksson, Anders; Hartley, Richard, IEEE Conference on Computer Vision and Pattern Recognition, 2010.
(bibtex and
code
)

This paper proposes two methods for outlier removal in structure for motion.
The first is an improvment of the algorithm presented by Sim & Hartley in this
paper.
We achive a speedup or roughly an order of magnitude with the same guarantees.
The second one is related to the popular approach of
L1-approximation.
While this version is even more efficient in terms of computation times, it does not have the same guarantees as the previous one.
We show using duality that these methods are in fact related and we demonstrate on a number of large scale data sets that both methods
works very well.

This paper proposes a branch and bound method for solving optimization problems involving orthogonal constraints,
and is based on the two conference papers here and here.
Our first application is a generalization of the absolute orientation problem introduced in
Horn et.al. Rather than only registering two point clouds we also use point-to-plane and point-to-line correspondences.
The second application considered is calibrated camera pose estimation.
We show on both real and synthetic data that local minima do occur for this problem.

In this paper we study optimization problems with quasiconvex residuals under the L2-norm.
We consider the same framework as is introduced in
Kahl-Hartley-2008. However, since we are not using the max-norm, but rather the least squares formulation,
we cannot be sure that the resulting function is local minima free.
Rather than resorting to more complex global optimization methods, such as branch and bound,
we derive a simple test for verifying that a given local minima is indeed global.
We show on a number of real data sets that this approach is very efficient and in the vast majority of all cases
we are able to verify optimality.

This paper extends this one by considering 1-dimmensional vision problems.
In this case it is more natural to use the sum of squared angular error.
We show that similar to the 2D-case it is possible to derive a similar test to verify optimality.
Further more, we present a probabilistic method that can with high probability verify global optimality
should the former one fail.

In this paper we study how to fill in the missing data entries of a matrix under a low rank constraint.
The well known
factorization method can be used to solve for both structure and motion simultaneously
if the entire observation matrix is known. However, if some of the entries are missing these have to be
filled in using a rank constraint. Since the rank function is non-convex we propose to instead choose the
missing data that minimizes the sum of the singular values, the nuclear norm, of the measurement matrix.
This approach is similar to the successful
L1-relaxation
method that has successfully been applied to
compressed sensing and other related areas.

This paper studies the generalized registration problem introduced here.
In particular we investigate the quality of the lower bound on the global optimum which is provided by
Lagrangian duality. Since the orthogonallity constraints are non convex we cannot expect that there is no duality gap.
However, we show that the obtained solution is always better than direct linear methods, and in practice close to
the optimum.

In this paper we study outlier removal problems for problems with pseudoconvex residuals.
We show that these problems can be solved in polynomial time under the max-norm.
Speciffcally, if the dimmension of the search space is n (eg. triangulation n=3, homography estimation n=8) then
there is always a set of no more than n+1 residuals that determines the solution.

This paper studies how to effectively solve multiple view geometry problems under the max-norm.
The framework is the same as that of
Kahl-Hartley-2008. We show that these problems are not only quasiconvex but pseudoconvex, which is
a stronger criteria. This opens up the possibility of using other more efficient algorithms than the bisection algorithm.

This paper extends the L-infinity framework to problems in 1-dimmensional vision when using angular errors.
We show that a number of 1D geometry problems are quasiconvex.
Using these results we also propose a branch and bound method
for the full (bi-quasiconvex) structure and motion problem for 1D-vision.

This paper considers a generalization of the absolute orientation problem introduced in
Horn et.al. Rather than only registering two point clouds we also use point-to-plane and point-to-line correspondences.
We propose a branch and bound method that efficiently solves this problem.

In this paper we present a branch and bound method for calibrated camera pose estimation.
We show on both real and synthetic data that local minima do occur for this problem.

This paper presents a reformulation of the normalized cuts criterion that allows incorporation of linear equality constraints.
This is done by restating the problem and showing how linear constraints can
be enforced exactly in the optimization scheme through duality.