Tuesday, May 17, 2011

metric rectification from circles

The circular points encode the metric structure of the 2D plane. They are invariant to rotation, isotropic scaling and translation. If the plane undergoes a more general (e.g. affine or projective) transformation  they move to some unknown location. Metric rectification can be performed if we find the image of the circular points and move them back to their original location. These points are not directly visible in the image but can be indirectly found using for instance the images of right angles [HZ].

The circular points can also be extracted from the images of two circles [IC]. Any two circles in general position intersect at two possibly complex points. Interestingly, if we allow points at infinity there are two additional intersection points: the circular points (1,i,0) and (1,-i,0), which belong to all circles. If the plane suffers a projective transformation the circles become general conics (in a perspective view with the circles in front of the camera the images are ellipses). Incidence and tangency are preserved by projective transformations, so the images of the circular points are among the intersections of the observed ellipses.

This nice visual geometry result is illustrated by a simple easyVisio demo. I like it because it touches many interesting mathematical themes. I have recently rewritten the code to use higher level combinators.

To achieve metric rectification of a plane containing the image of several circles we must solve the following subproblems. All of them involve interesting mathematical results.

1) Detection of salient regions in the image. We can use fixed gray level thresholding and extraction of connected components, which can be represented by reduced contours:

2) Find elliptical shapes, and estimate the conic parameters:

3) Compute the intersection of two ellipses (e.g. the two bigger ones). We must find the roots of a polynomial of degree 4. It is a good idea to move the problem to a reduced form in which one of the ellipses is the unit circle and the other one has only 4 parameters (center and axes). A polynomial of degree 4 can in principle be solved in closed form, but we can also try simpler numeric methods. We can compute the eigenvalues of the companion matrix or directly use the GSL polynomial solver also available from hmatrix.

We get two pairs or complex conjugate solutions, which define two real lines. We can easily separate the "true" circular points, which define the horizon, from the other "normal" intersections:

intersections of the two bigger ellipses (real part and lines joining them)
The rectification transformation can be easily computed from the imaged circular points:

If the aspect ratio and principal point are known the focal length can be inferred and a full camera can be recovered. Then we can insert virtual objects in the scene

and show the camera position in 3D space:
It is also interesting to show a graphic representation of the projective invariants of two conics. They are closely related to the common tangents, which can be computed as intersections of the corresponding dual conics:

eigenvectors of C1*inv C2 (yellow) and common tangents (orange)
The demo program is just a composition of pure functions for the above steps. We intercalate several monitoring windows in the pipeline. The source code looks simple but it uses many tools from the easyVision packages. Here is a video of the program in action:

Wednesday, March 02, 2011

easyVision in a netbook

I have updated the installation instructions, including help for the recent IPP 7.0.

In one of the tests I have installed the whole system (except the CUDA package for SIFT) in a cheap netbook with Ubuntu 10.10. The applications obviously run slower, but I am surprised to find that most of them are perfectly usable. I can use it for class demos.

The built-in camera works without any problem, and two more USB webcams are supported. The following screenshot shows an application working with three live views. Synchronization is really bad, but in static scenes we can try out many interesting multiview experiments.

Tuesday, February 22, 2011

Tensor Diagrams

In 2009 we attended ICCV at Kyoto. Visiting Japan was a fantastic experience.

We presented a graphic interpretation of multiview geometry:

For instance, this is the beautiful internal structure of the Fundamental Matrix:

All tensor constructions in the paper were developed and checked with the Haskell package hTensor.

Monday, February 14, 2011


I'm sorry that I have not updated this blog for so long, there are many ideas to try out! But I have decided that at least I must include the screenshots of the prototypes I am working on, if only to remember them in the future.

We are happy to have a paper accepted at CVPR 2011, so I will start with the screenshot of an old friend:
This is obtained by a Haskell implementation of the GEA (global epipolar adjustment) method proposed in the paper. It is faster than sparse bundle adjustment, and gets a solution which is very close to the optimal one. Here are two more 3D reconstruction examples from point correspondences taken from bundle adjustment in the large:



Thursday, May 28, 2009


The library now includes an interface to the excellent ChangChang Wu's GPU implementation of SIFT. Vision problems that were recently considered infeasible can now be easily solved in standard computers.

With a NVIDIA 9800 GT we obtain interest points and descriptors at ~ 25 fps for 640x480 images, depending on the level of detail in the scene. Fast GPU point matching is also provided.

(Using a 8600M GS we get ~10 fps. Most of the time is spent in the smallest scales, so we can get a big speedup if we start from octave 1.)

To appreciate the computing power of SiftGPU you should run the detector on a live camera. Here are some screenshots of a simple object recognition prototype. First, of course, the H&Z book:

Then Real World Haskell, detected just from a leg of the insect:

This works very well, but the number of known objects is small. We need a fast method for feature indexing.

SiftGPU requires specific hardware, but it is extremely useful. Clearly, easyVision must be finally cabalized so that we can install a basic, portable system, and any desired optional modules.

Monday, January 12, 2009


After a long delay, I have finally released a first version of a tutorial for easyVision. I know that this system is only useful to me, but trying to explain something to other people is the best way to find things that can be improved, have a poor design, or more probably, that I don't really understand.

For instance, I noticed that the definition of camera combinators was too verbose. We can now write more elegant programs like this:

main = run $ camera >>= observe ~~> f . g . h >>= zoom

In Haskell, if a program compiles it is very likely correct. Also, ugly code strongly suggests that the essential problem is not well understood. Trying to write a better solution we often discover that it can be reformulated in a more general way. And this opens up still more possibilities for improvement.

Saturday, June 28, 2008

Interest Points

Recent algorithms for detection of scale and rotation invariant interest points have produced fantastic advances in many computer vision fields. For instance, explicit segmentation is no longer required in some object recognition applications.

Interest points are typically computed as local extrema in the scale-space of an appropriate response function. The celebrated SIFT by D. Lowe is based on the Laplacian, which is approximated by differences of gaussians in an image pyramid.

I'd like to include in the easyVision library a reference implementation of a good interest point detector. Some experiments indicate that the determinant of the Hessian obtains very satisfactory results. In a first stage I am not very worried about speed, so we will explicitly work with the full image smoothed at different levels, without any downsampling.

The following screenshot shows the interest points detected on a 640x480 image by analyzing 15 scales up to sigma=25. Blobs of different sizes are detected with excellent accuracy.

Woody Allen's Interest Points

This extremely unoptimized algorithm is however quite fast. With half-size images (320x240) we get more than 15fps in a modern laptop, and at 640x480 it runs at 3fps. This is of course not real-time, although it could be acceptable for some live video applications. In any case, note that the program currently runs single threaded. Most of the time is required by the Gaussian filters, which can be done in parallel. (This is the next step I will try to solve, but first I must consolidate a few things in the library.)

A simple local descriptor based on local gradient distributions is also computed for each point, which can be used for object recognition and image matching:

(The current implementation of the rotation invariant descriptor, based on explicit image warping, is quite slow. In the above examples we work with a scale-only invariant descriptor.)

I am currently preparing more detailed installation instructions and other information about the library.

In a future post I will describe some low level algorithms implemented directly in Haskell.