Visual Detection of Motion — Part II

(Part I)


In the first part I used global population statistics of individual color histograms to detect motion, largely because it was intuitively simple and easy to process. It has many limitations.

A better approach focuses on local properties in the imagery to perform detection. This approach takes a different kind of processing, called morphological filtering. I wrote a common set of morphological filters which have been added to the development version of the Common Lisp library CL-PNG making them easy to use in this context.

Image Processing Functions

CL-PNG started out as a Common Lisp library for manipulating PNG (Portable Network Graphics) images. It has grown a bit since then, and now includes conversions for BMP files and a growing number of image processing functions. This additional functionality has been available via subversion following version 0.6 but hasn't been documented before. This page describes much of the undocumented functionality available now in the package through subversion1.

This page serves two purposes: as a tutorial on image processing techniques to perform motion detection, and as a demonstration of many of the image processing functions now available within CL-PNG.

(1) The subversion repository is available at

Motion Detection Example

I will start with a short sequence of images taken with a web camera that includes motion — in this case of the author walking across a room (while being followed by a dog who could not be convinced to do otherwise). The capture rate was roughly three frames per second. Figure 1 shows an early sample frame from the sequence.

Simulating capture data

For algorithm development, we don't want to have to keep getting up and walking across the room (and arguing with the dog) everytime we make a little change to our code and try it out. It's easier to develop an understanding of what's going on when working with a repeatable sequence of data. So, what we want is a testbench that simulates capturing the data we've already sampled.

  ;;;## Simulate capturing data each time CAPTURE-IMAGE is called until
  ;;;   all images in sequence are exhausted.
  (let ((image-sequence *test-sequence*)
    (defun reset-capture-sequence (&optional (seq *test-sequence*))
      (setf current-image-path nil
            image-sequence     seq))
    (defun get-current-image-path () current-image-path)
    (defun capture-image (&optional flip)
      (cond ((null image-sequence) nil)
                 (setf current-image-path (enough-namestring
                                           (car image-sequence)
                 (bmp::decode-file (car image-sequence) :flip flip)
               (setf image-sequence (cdr image-sequence)))))))

Figure 1. Example from motion sequence. The frame resolution is 640x480 pixels but I've scaled them down by half for these thumbnails.

Basic Operations

Conversion to Grayscale

One of the first operations we need to do is convert a color RGB image to grayscale. We will use the grayscale-image function:

  (let* ((image (capture-image))
         (gr (image::grayscale-image image)))
    (bmp::encode-file gr "/tmp/gray.bmp")
    (ext-program "feh" '("/tmp/gray.bmp")))

Each time the above script is called, the next "captured" image is converted to a grayscale image and displayed. Obviously, it could be put in a loop which goes through all of them at once. But our purpose here is merely to demonstrate the transformation. Figure 2 shows the outcome of one such conversion.

A Merge Filter

The merge-rgb-gray filter combines a grayscale image with one of the channels in an RGB image, choosing whichever has the maximum value at each pixel. We will use it to display our results by making it standout strongly in red.

Figure 2. Example grayscale converted image

  (defun merge-rgb-gray (rgb gray channel)
    "Merges GRAY image into RGB CHANNEL via maximum. "
    (unless (equalp (image:size rgb) (image:size gray))
      (error 'mismatched-image-sizes :sizes (list (image:size rgb) (image:size gray))))
    (let ((new (image:make-image-like rgb)))
      (dotimes (i (image-height rgb) new)
        (dotimes (j (image-width rgb))
          (dotimes (k (min 3 (image-channels rgb)))
            (if (= channel k)
                (setf (aref new i j k) (max (aref rgb i j k) (aref gray i j 0)))
                (setf (aref new i j k) (aref rgb i j k))))))))

Image arithmetic

The basic arithmetic functions are add, add*, subtract, and subtract*. The starred versions do their operations destructively (in-place). The image sizes must match. Limits are handled by clipping.

Simple transformations

The mirror function flips an image horizontally. The flip function flips it vertically. The rotate function rotates an image positive 90 degrees (counter-clockwise — we're mathematicians here).

Binary Morphological Filters

Binary filters are filters which act on binary images: images whose pixels have values of either 0 or 1. A binary-morphological-filterreturns a binary image whose pixels have been determined by imposing a mask on the surrounding pixels and combining them with an operation: and, or, or maj, where and requires all the pixels intersecting the mask pixels to be set, or requires any of them to be set, maj requires the majority of them to be set.

A binary-morphological-filter which applies a simple centered uniform cross or square with the and operation is generally called an erosion-filter

Conversely, a binary-morphological-filter which applies the same type of mask with the or operation is generally called a dilation-filter

More complex designs are obtain which preserve object size while exploiting characteristics of the initial filter by cascading these two filters, resulting in the open-filter and close-filter. Applications details of all these filters can be seen in the motion tutorial code itself.

General 2-d convolution can be performed with the convolve function, which takes an image and a kernel as arguments.

A Simple Approach to Motion Detection

The simplest apprach to motion detection is to compare the current frame with the previous one. Lets see how well it works. We'll call the current full-color frame IMG, a grayscale copy of it CURRENT-FRAME, and the previous grayscale frame BACKGROUND-FRAME. The first thing we do is find the regions where these two frames differ. We will use the difference and threshold filters for this. As with the image sequence itself, we set up a closure functions for interacting with it's scoped variables.

  (let ((frame)

    (defun update-background (bkgnd-fcn)
      (setf background (funcall bkgnd-fcn background frame)))

    (defun get-background () background)

    (defun reset-process-cycle ()
      (setf background nil
            frame nil))

    (defun process-cycle (motion-fcn bkgnd-fcn)
      "Returns the image or NIL if none captured. "
          (update-background bkgnd-fcn)
          (setf image (capture-image T))
        (setf frame (grayscale-image image))
        (when background
          (setf difference (image::subtract background frame))
          (funcall motion-fcn difference image)))))

The process function, which carries out the image processing, is passed to the process cycle. It defines the following sequence of operations:

  1. applies threshold-filter to image;
  2. applies erosion-filter to thresholded image, using a simple 2-by-2 cross mask;
  3. merges the maximum pixels from the results into the red channel of the original image with the merge-rgb-gray filter.

The identity-background function is needed to initialize the background estimation process.

  (defun identity-background (background new-frame)
    (declare (ignorable background))

  (let ((filelst))
    (defun reset-filelst (file)
          (setf filelst (graham:mklist file))))
    (defun saveit (image file)
      (let ((p (merge-pathnames file "/tmp/")))
        (ensure-directories-exist p)
        (png::encode-file image p)
        (push (namestring p) filelst)))
    (defun get-filelst () (reverse filelst)))
  (defun process (difference image)
      (declare (ignorable image))
      (let* ((thresh  (image:threshold-filter difference 20))
             (mask    (image::generate-mask :cross 2))
             (eroded  (image:erosion-filter thresh :mask mask))
             (hilite  (merge-rgb-gray image eroded 0)))
        (reset-filelst NIL)
        (saveit difference "diff.png")
        (saveit thresh "thresh.png")
        (saveit eroded "eroded.png")
        (saveit hilite "hilite.png")))
    (process-cycle #'process #'identity-background)
    (ext-program "feh" (print (get-filelst))))

Everytime the PROCESS-CYCLE function is called, the difference image is updated by successive difference (see Figure 3).

The difference image is thresholded to a binary image (see Figure 4).

The binary thresholded image is noisy, so an erosion filter is applied (see Figure 5). Notice that the noise in the central image is now gone, however the detection region has also diminished slightly.

Finally, the erosion filter result is used as a hilight-cue and merged back into the red-channel of the original RGB image (see Figure 6).

Figure 6. Resulting motion-hilighted image (in red).

While a bit of a red outline can be seen along the leading edge of forward motion, there really wouldn't be anthing here for a pattern recognition system to work with. But it is a start. The problem with this algorithm is that differences in the interiors of large moving objects (those one is usually interested in) are mostly cancelled out each time because they are fairly homogeneous and overlap. The previous image is just not a good estimate of the real background. We need a better background estimate.

Figure 3. Output of differencing background estimate from current image.

Figure 4. Thresholding the difference results in a binary image with obvious motion signal and noise.

Figure 5. An erosion filter reduces noise and consolidates motion signal, but shrinks it as well.

A Better Background Estimate

Successive differences are not very effective for motion detection. The interiors of most moving objects are usually similar enough that their difference is minimal leaving only the edges to detect on. We will use the MOVE-TOWARDS filter to successively improve a background approximation. Each capture will contribute to this approximation and gradually the noise will drop out leaving only the constant imagery or that which changes very slowly.

  (defun movetowards-background (background new-frame)
    (if background
        (image::move-towards background new-frame 10)
        (setf background new-frame)))
  (defun process (difference image)
    (declare (ignorable image))
    (let* ((thresh  (image:threshold-filter difference 20))
           (mask    (image::generate-mask :cross 2))
           (backgd  (get-background))
           (open    (image:open-filter      thresh :mask mask))
           (hilite  (merge-rgb-gray image open 0)))
      (reset-filelst NIL)
      (saveit hilite "hilite.png")
      (saveit backgd "backgd.png")))

The background image from the third frame in the sequence is shown in Figure 7, and even though it appears to have become only slightly more faint, the improvement in the difference image shown in Figure 8 is quite striking.

Thresholding this difference results in Figure 9, which clearly shows a much larger region to detect on.

We can also improved on the erosion filter by using an open filter instead. Like the erosion filter, this eliminates noise, but wherease the erosion filter tends to shrink the image, this effect is reversed by following up with an opposing operation to preserve the size. The result is shown in Figure 10.

The final motion-hilighted original is shown in Figure 11.

Figure 11. The improved motion-hilighted image.

Figure 7. Successive approximation background estimate after three frames.

Figure 8. Difference image between current capture and successive approximation background.

Figure 9. Threshold filtered image

Figure 10. Open filtering reduces noise while preserving size (compare with erosion filter)

Edge Detection

An edge detector can be built using a convolutional filter with a differentiating kernel.

  ;;; Convolution kernel used for edge detection:
  ;;;  0 -1  0
  ;;; -1  4 -1 
  ;;;  0 -1  0

Here is the edge detector processing sequence:

  (defparameter *edge-kernel*
    (let ((mask (make-array (list 3 3)
                            :element-type 'float :initial-element 0.e0)))
      (setf (aref mask 0 1) -1.e0
            (aref mask 1 0) -1.e0
            (aref mask 1 2) -1.e0
            (aref mask 2 1) -1.e0
            (aref mask 1 1)  4.e0)
  (defun edge-detect (image &key (kernel *edge-kernel*) (fill '(#x0)))
    (image:convolve image kernel :fill fill))
  (defun process (difference image)
    (declare (ignorable image))
    (let* ((thresh  (threshold-filter difference 20))
           (mask    (image::generate-mask :cross 2))
           (open    (image:open-filter thresh :mask mask))
           (outline (edge-detect open))
           (hilite  (merge-rgb-gray image outline 0))
           (pth (get-current-image-path))
           (skl (catstr "junk/t_" (pathname-name pth) "_hilite.png")))
      (saveit hilite skl)))
    (reset-filelst NIL)
    (process-cycle #'process #'movetowards-background)
    (do ()
        ((null (process-cycle #'process #'movetowards-background)))))

One of the motion-hilighted originals is shown in Figure 12 and the next one in the sequence in Figure 13.

This processes the entire sequence of captures which can be viewed sequentially with the following one-liner:

  (ext-program "feh" (print (get-filelst)))

Or you can look at the video clip I made of it2

Figure 12. An outline motion-hilighted image (linked to a higher res version).

Figure 13. The next outline motion-hilighted image (linked to a higher res version).

(2) I made three versions of the final video clip:

The last two are supposed to be compatible with windows. You should be able to play one of these no matter what kind of system you are on.

There are free video codecs available for Windows if you can't view any of them.

The Next Step

The next thing to do is develop a metric based on our processed products. But this is enough for one page.