Optical Flow or Motion Estimation Using the Watson-Ahumada (WA) Algorithm

SIDDHARTH_JAIN

4.98/5 (23 votes)

Aug 10, 2007

CPOL

12 min read

144396

13893

Optical Flow or Motion Estimation Using the Watson-Ahumada (WA) Algorithm

Introduction and Background

This article will teach you how to estimate motion in a stream of images. I saw that some articles already exist on CodeProject for motion detection. To be precise whereas motion detection amounts to just detecting if there is any movement, motion estimation means you have to estimate the direction of motion also. Motion estimation has been extensively studied by both computer vision and human vision researchers for more than 25 years. It is also commonly referred to as optical flow computation especially in the computer vision literature. The algorithm that I will use in this article was given by Watson & Ahumada (WA) in the following paper which I highly encourage you to read:

Watson, A. B., & Ahumada, A. J., Jr. (1985). Model of human visual-motion sensing. Journal of the Optical Society of America A 2(2), 322-342.

The beauty of this algorithm is that it is a fairly accurate model of how our brain itself senses motion. In order to understand the algorithm some familiarity with Fourier analysis may be needed. The algorithm works by resolving the input stimulus into its component sinusoidal gratings oriented in different directions and estimating the motion of those gratings. This approach works well because determining the motion of a sine grating is easy: there exists a simple relationship between the spatial and temporal frequencies of the grating and its velocity (ω_xv_x+ω_yv_y+ω_t=0 where ω_x,ω_y are the spatial frequencies of the grating v_x,v_y is the velocity of the grating and ω_t is the temporal frequency of the grating). Having given you the necessary pointers that explain the algorithm in detail, let me now move on to how you can use the algorithm for optical flow estimation in your own applications.

Using the Code

The WAv2.zip file available for download with this article has two projects:

WAv2: This is a class library (DLL) that contains the functions to do optical flow computation. It uses two DLLs: fft.dll and mapack.dll which should appear in the bin/Debug directory of WAv2 project.
Examples: This is a project which shows you how to use the functions in WAv2.dll to do optical flow computation. Six test cases of optical flow computation are covered: simple, medium, complex, blocks, grid, vcbox. In each case 8 images are input to the WA algorithm; the algorithm analyzes the images and computes the resulting optical flow. You will need to download the images for these test cases from this website. Download the simple.tar.gz, medium.tar.gz, complex.tar.gz, blocks.tar.gz, grid.tar.gz, vcbox.tar.gz files and unzip them into some folder. I created animated gifs of these test cases so that you can see the motion in them. Below are shown the animated gifs and also the optical flow extracted by my code for all the 6 test cases:
Simple:

Medium:

Complex:

Blocks:

Grid:

Vcbox:

The most important function in WAv2.cs is:

private float[][,,] _WAComputeOpticalFlow(ComplexF[,,] I, 
    double fr, int N, int[] LevelsToCompute)

This is where the optical flow computation is done. Note that I have declared the function as private and there is no need for you to call it directly. Instead there are a number of public functions in WAv2.cs with names WAComputeOpticalFlow all of which at some point call _WAComputeOpticalFlow. You should use one of these functions for your optical flow computation. I will now go through an example in detail; as we work through the example, it will become clear what the different arguments to _WAComputeOpticalFlow mean. The example I choose is the test case: simple. In the file examples.cs there is a function Simple() which looks like this:

static void Simple()
{
    string dir = "c:/c#/optic_flow_test_data/simple/"; 
         // the directory where images are stored

    int N = 8;            // number of files we wish to read 

         // for optical flow estimation

    string[]    filenames = new string[N];
    for(int i = 0; i < N; i++)
        filenames[i] = dir + "anim."+(i+10)+".tif";
    WAdetector WA = new WAdetector();
    float[,,] f = WA.WAComputeOpticalFlow(filenames);    
        // compute optical flow for the sequence 

        // of images specified in filenames

    float[,,]    quiver_data = new float[f.GetLength(0),
        f.GetLength(1),2];    // for the quiver plot

        for(int y = 0; y < quiver_data.GetLength(0); y++)
        for(int x = 0; x < quiver_data.GetLength(1); x++)
        {
            quiver_data[y,x,0] = f[y,x,0]*f[y,x,2]; 
                // scale vx by weight w

            quiver_data[y,x,1] = f[y,x,1]*f[y,x,2]; 
                // scale vy by weight w

        }
        Bitmap bmp = WAdetector.Quiver(quiver_data);    
        // creates a bitmap to visualize the data in quiver_data

        int date = DateTime.Now.Year*1000 + 
            DateTime.Now.Month*100 + DateTime.Now.Day;    
            // get date in yyyymmdd format

        int time = DateTime.Now.Hour*1000 + 
            DateTime.Now.Minute*100 + DateTime.Now.Second; 
            // get time in hhmmss format

        string filename = dir + "simple_flow-" + 
            date + "-" + time + ".bmp";
        bmp.Save(filename); // save the quiver plot

}

The variable dir points to the directory where the images are stored. Change the value of this variable accordingly if you store the images in some location other than c:/c#/optic_flow_test_data/simple/. N=8 images will be given as input to the optical flow algorithm. You will understand why I have chosen 8 images as we work through the example. filenames just stores the names of the files. The optical flow computation is done by the following statement:

float[,,] f = WA.WAComputeOpticalFlow(filenames);    
   // compute optical flow for the sequence of images specified in 

   // filenames

The optical flow is returned in the variable f. f[y,x,0] is v_x, the velocity in x direction at location (x,y). f[y,x,1] is v_y, the velocity in the y direction at location (x,y). f[y,x,2] is a weight given to the velocity estimate (higher weight means the velocity estimate is more accurate). The velocity estimates are returned in a left-handed coordinate system meaning that v_x is positive to the right and v_yis positive to the bottom.

The optical flow can be visualized as a quiver plot using the function WAdetector.Quiver. In my quiver plots I usually scale the velocity estimates by their weight. This is what is done by lines 10-16. Lines 18-21 save the quiver bitmap as a BMP file. The quiver plot has been shown above. Looking at the quiver plot you can see the algorithm correctly captures the rotation of the square and the movement of the sphere in the input image sequence. Let us now look more closely at what the following line does:

float[,,] f = WA.WAComputeOpticalFlow(filenames);    
   // compute optical flow for the sequence of images specified in 

   // filenames

The function WAComputeOpticalFlow(filenames) computes the optical flow for the sequence of images whose filenames are specified in the array filenames. Currently I support only two formats: PixelFormat.Format8bppIndexed which is what many GIF files are and PixelFormat.Format24bppRgb which is what many TIF and JPG files are.

WAComputeOpticalFlow(filenames) uses default values of parameters needed by the WA algorithm and calls the function public float[][,,] WAComputeOpticalFlow(int sampling_rate, int fd, int T, int num_orientations, int[] levels_to_compute, string[] filename) with the default parameters as shown by following code snippet:

public    float[,,]    WAComputeOpticalFlow(string[]    filename)
{
    /* use default values for the parameters needed by the model */
    Bitmap bmp;
    try
    {
        bmp = new Bitmap(filename[0]);
    }
    catch
    {
        throw new Exception("ReadImage: unable to read file " + 
            filename[0]);
    }
    int h = bmp.Height;
    int w = bmp.Width;
    int x = (int) Math.Round(0.5*(Math.Log(h,2) + Math.Log(w,2)));
    int n = x - 5;
    if  (n < 0) n = 0;
    if  (n > 3) n = 3;
    int[]    levels_to_compute = new int[]{n};
    float[][,,] f = WAComputeOpticalFlow(80, 2, 16, 10, 
        levels_to_compute, filename);
    return f[0];
}

The most important parameter setting you may wish to understand is the timing information. If you have used any optical flow algorithms before, many of them just take two images as input and compute the flow between them. Well, if I show you a sequence of images at a frame rate of 1Hz there is no chance that you will see motion in them. When you go to the theater the images are played at a rate of 24-30Hz or so because it is only then that the perception of motion occurs. Because the WA algorithm is a model of human visual motion sensing you need to specify such timing information to the algorithm in addition to giving it the sequence of images. There are two things I wish you to understand:

First, the motion that your brain senses at time t is based on what your eyes saw from time t-T to t where T is approximately 200ms. The intensity matrix I(y,x,n) that is input to _WAComputeOpticalFlow and computed by WAComputeOpticalFlow(int sampling_rate, int fd, int M, int num_orientations, int[] levels_to_compute, string[] filename) represents discrete-time samples of what your eyes saw in this 200ms period. You specify the sampling rate at which these samples were taken by the variable sampling_rate. If there are a total of M samples which is the length of I along the n dimension (M=I.GetLength(2)) then M/sampling_rate should equal T=200ms. Because the algorithm takes a FFT (Fast Fourier Transform) of I it requires M to be a power of 2. The higher M is, the better; it should at least be 16 otherwise the optical flow estimates may not be very accurate. Using M=16 we get a sampling_rate of 16/0.2=80Hz.
The next thing you want to consider is what is the frame duration (frame duration = 1/frame rate) of the images that would be input to the algorithm. As I mentioned before when you go to a theater the images are played at a certain frame rate to create the perception of motion. It is found that a frame duration of about 30ms corresponding to a frame rate of about 33Hz is optimal for motion perception. In the function WAComputeOpticalFlow(int sampling_rate, int fd, int M, int num_orientations, int[] levels_to_compute, string[] filename) I constrain the frame duration to be an integer multiple of 1/sampling_rate; frame duration = fd/sampling_rate. It will become clear in just a moment why I do this. With sampling_rate=80Hz I find a frame duration of 2/80=25ms works well in practice.

Let me summarise: with sampling_rate=80Hz, M=16 the intensity matrix I in WAComputeOpticalFlow represents 16 samples of what your eyes saw over a time period of T=16/80=200ms. With a frame duration of 2/80=25ms each image stays on the eye or the retina for 25ms. If you have really understood the above discussion you understand this means each image will have to be repeated fd=2 times in the intensity matrix (25ms = 2 samples at a sampling rate of 80Hz). This is how the intensity matrix is created in WAComputeOpticalFlow(int sampling_rate, int fd, int M, int num_orientations, int[] levels_to_compute, string[] filename). By now, hopefully you can also understand why I constrained the frame duration to be an integer multiple of 1/sampling_rate. I leave it as an exercise to you to write a function in which the frame duration is not constrained in this manner if you may happen to need such a function.

There are two other parameters of WAComputeOpticalFlow that need some explanation: num_orientations and levels_to_compute. num_orientations specifies the number of orientations/gratings into which the input will be resolved. Just set num_orinetations=10 and don't mess with it. levels_to_compute is an array through which you specify the spatial scales or levels at which you are interested in getting the optical flow information. If you are familiar with Fourier analysis you know the Fourier Transform allows an image to be expressed in terms of its spatial frequency components. The different levels of the WA algorithm are sensitive to different spatial frequencies of the input as shown in following figure:

In my current implementation, I do not allow for computing optical flow at levels higher than 3. However, you can change this to your heart's content by modifying the value of private const int MAXLEVELS in WAv2.cs. In my implementation, if the input has a resolution of h by w pixels, then level i returns optical flow computed at a resolution of 2^-ih by 2^-iw pixels. It is possible to write an implementation in which the resolution does not decrease as the level number increases. However, it may be something that is unwarranted, because the low frequency components of an image represent its coarse structure and supersampling this coarse structure may not give any additional information.

Let me now address the practical question: what should be the value of levels_to_compute? The first thing to note is that the entries of levels_to_compute have to be in ascending order, otherwise the optical flow computation will crash. The level at which you may want optical flow to be computed is really application-dependent. You should compute the flow at all levels meaning levels_to_compute = new int[]{0,1,2,3} and then narrow down the levels that seem to work best or are sufficient for you. If you are familiar with Gaussian Pyramids that's what the levels_to_compute is. Level i instructs the algorithm to compute optical flow at level i of the Gaussian Pyramid of the input.

If you are confused, don't worry. Use lines 13-19 of the function WAComputeOpticalFlow above as a guideline to choose the level at which to compute the optical flow. I have found this guideline to be useful in choosing what level to use at least in my applications. The guideline is based on the established fact that the power spectrum of natural images displays a power law behavior and 1/f² frequency dependence, e.g. read following paper: Modeling the Power Spectra of Natural Images: Statistics and Information by van der Schaaf A. & van Hateren J.H. in Vision Research, Volume 36, Number 17, September 1996 , pp. 2759-2770(12) (a free version of the article is available here).

What this means is that most of the power is concentrated at low spatial frequencies and it rapidly falls off with increase in frequency. Therefore you may be better off computing optical flow at higher levels because that's where most of the power is. As you will experiment with the different levels, you may find that the lower levels are well tuned for detecting very small displacements whereas higher levels respond better to relatively larger displacements; this is because very small displacements will get washed out at higher levels of the pyramid due to Gaussian blurring and subsampling.

One way to test this out is to create a display of random dots undergoing radial motion. The displacement of a dot is directly proportional to its distance from the center of expansion/contraction. When I made such a display (512 by 512 pixel resolution) the following are the optical flows I got at successively higher levels.

At level 0 (center frequency = 1/4 cycles/pixel):

At level 1 (center frequency = 1/8 cycles/pixel):

At level 2 (center frequency = 1/16 cycles/pixel):

At level 3 (center frequency = 1/32 cycles/pixel):

At level 4 (center frequency = 1/64 cycles/pixel):

You can see from the above images that lower levels capture flow at center (small displacements), whereas higher levels capture flow at periphery (larger displacements).

Enough said. Just a few more pointers: what does the return value of WAComputeOpticalFlow(int sampling_rate, int fd, int M, int num_orientations, int[] levels_to_compute, string[] filename) mean? Let the return value be stored in a variable f. Then f[i][y,x,0] is v_x, the velocity in the x direction; f[i][y,x,1] is v_y, the velocity in the y direction; f[i][y,x,2] is a weight associated with the(v_x,v_y) estimate and all are computed at location (x,y) and level i. The velocity estimates are returned in a left-handed coordinate system meaning that v_xis positive to the right and v_yis positive to the bottom.

Visualizing the optical flow as a quiver plot using WAdetector.Quiver is great but sometimes you may want to store and read the raw optical flow. The functions public static void SaveOpticalFlow(float[,,] f, string filename) and public static float[,,] ReadOpticalFlow(string filename) do just that.

Wrap Up

This is an advanced topic and I don't know how well of a job I have done explaining the code. I solicit your comments to make this article better.