Creating your own animation format

Paulo Zemek

4.96/5 (21 votes)

Feb 1, 2013

CPOL

14 min read

38052

729

Learn how to create your own animation format, capable of doing basic compression.

Download sample - 259.4 KB

Background

I always liked to manipulate images by code, be it by creating images and animations by code or by manipulating photos (again, by code). I already wrote two articles on how to manipulate bitmaps in a relatively fast & safe way in .Net but, even if I can create animations by drawing all the frames I need, something I miss in .Net is a framework capable of reading and writing animations (or videos) using easy to understand methods and normal Streams.

I already used AForge. I didn't look at the latest version of the library, but the problem I had with it is that I can't (or couldn't) write frame by frame to .Net Stream, forcing me to tell a file name where it will write the final video (and I wanted to send the video by TCP/IP). At that time I solved the problem by streaming each frame individually but now I decided it was time to do a small animation framework.

Being minimalist

To be minimalist, we need a video producer (an interface) with an WriteFrame method to create fixed-time animations or animations that send the actual frame on client requests (like TCP/IP based animations) and a consumer interface, to read the next frame.

Considering I implement such interfaces, I would be already capable of writing frames to a TCP stream and then receive those frames by client on the other side. The implementation could encode each frame as a normal image or could use any algorithm to detect the changes done to the frame and send only those changes.

As I said, I am being minimalist. One interface is only responsible to add frames while the other is only responsible to read those frames.

The interfaces could be as simple as this:

public interface IFrameWriter
{
  void AddFrame(Bitmap frame);
}

public interface IFrameReader
{
  Bitmap GetNextFrame();
}

Problems

As simple as those interfaces look, we already have problems.

Bitmap. What is a Bitmap after all? The Bitmap class that works for Windows Forms is not the same Bitmap that works for WPF. In both articles where I talked about fast bitmap manipulation I used references to the System.Drawing or to the WPF libraries. I want a solution that is not bound to anyone of those but at the same time that can be used by both;
While reading a video we are not forced to create a new Bitmap each time. In fact, the best solution is to create a single bitmap and then always update its content, avoiding a lot of unnecessary allocations (and consequently, a lot of unnecessary collections);
When adding frames we may face the same problem. The algorithm may want to change the bitmap before streaming it. But, if we want to keep showing it, we should not allow that and we must copy the bitmap before;
You may ask me if I forgot to pass a Stream as parameter. I really think the frame-encoders or decoders must know how to access the data on the right places, be it using .Net Streams, native File Handles or even writing each frame as a separate file. So, even if this looks strange, I didn't forget a parameter and we only have 3 problems.

I'd like to talk about how to use the minimalist approach directly, but the Bitmap is enough to force me to create another interface, so I will continue with it.

The ManagedBitmap... again

To correct the minimalist interfaces, I should get rid of that Bitmap reference as that will create a direct reference to the System.Drawing.dll or the PresentationCore.dll and I don't want to force WPF applications to load the System.Drawing.dll as I also don't want to make Windows Forms applications to load the PresentationCore.dll (and also I don't want to make Silverlight or Console Application to have to load any of those libraries).

My solution is a reduced and safe only version of the ManagedBitmap I presented on the article Managed Bitmaps. The new ManagedBitmap is simple a matrix (or double-dimensional array) like class, that internally works over a normal array and with some graphic specific properties and methods.

I am planning to add more methods to the ManagedBitmap (methods to draw lines, rectangles etc) but for the moment and to keep the minimalistic approach, its important traits are:

It can get or set a pixel value by its coordinates;
It can be constructed by an array and a Width (so it can reutilize an existing bitmap pixel array [like Silverlight, even if I didn't test it in Silverlight]);
It can return its PixelArray, so you can easily convert its content to another kind of Bitmap;
Ok, it has other methods but we will see some of them later.

And, if you didn't saw the code or the other articles, it is important to know that the ManagedBitmap is a generic class. So, a pixel can be of type int, of type Argb32, of type Rgb24 or any type that can represent a pixel color, be it indexed or not.

So, having a bitmap type that is independent from Windows Forms or WPF (or anything else) we can finally recreate our minimalistic interfaces.

But now we have another problem. The ManagedBitmap is generic and that forces us to choose a pixel type or to make our interface generic too.

Which is the solution I choose?

If you know me or if you are looking at the code, you know that I chose both. I try to create an interface everytime I create a generic class. The interface is there to access the object without knowing its generic arguments at compile-time (which avoids using Reflection if you need to use it that way).

So, the ManagedBitmap itself implements the IManagedBitmap interface. And that's where our minimalistic interfaces start.

public interface IFrameWriter
{
  bool ChangesContent { get; }
  bool RequiresDifferentBitmap { get; }
  void WriteFrame(IManagedBitmap frame);
}
public interface IFrameReader
{
  bool ChangesContent { get; }
  IManagedBitmap ReadFrame();
}

And then I also create the generic version of those interfaces, so:

public interface IFrameWriter<T>:
  IFrameWriter
{
  void WriteFrame(ManagedBitmap<T> frame);
}
public interface IFrameReader<T>:
  IFrameReader
{
  new ManagedBitmap<T> ReadFrame();
}

You may notice that both interfaces have a ChangesContent property. That is because the FrameWriter may need to change the bitmap for its own logic and the FrameReader may opt to create a single bitmap and always change its content instead of allocating many bitmaps.

But what will happen if you still need the unchanged bitmap?

Well, it is up to you, the caller, to create a clone if needed. The properties are there to avoid unnecessary clones in those cases that the implementation does not change the bitmap.

And that RequiresDifferentBitmap?

In this case, the FrameWriter may be storing the old bitmap to check for changes. It is not the responsibility of the FrameWriter to clone the image to guarantee that it will contain the good old image the next call the WriteFrame() is called. It is the caller responsibility to check for that.

I did that for performance reasons, as cloning every frame may take unnecessary time if there is always a new bitmap when WriteFrame() is called. But if that's not the case, it is good to check this property and clone the bitmap if needed.

Making the first animation writer & reader

With those interfaces we are already prepared to create a simple animation format/streaming.

The truth is: We don't have any way to tell the frame-rate at this moment (not through the interfaces) and that can be very problematic. But, we don't need to worry yet. When streaming webcam frames, for example, we only need to send a new frame when the other side has actually finished receiving the old frame.

So, we will need to add extra information to create a real video-streamer or a better format later, but let's not focus on that for the moment.

A very simple video animation can be done by saving each frame as an entire image, be it PNG, JPG or whatever.

It is not the recommended approach, but in the Sample application you can find this FrameWriter:

using System;
using System.IO;
using System.Windows.Media;
using System.Windows.Media.Imaging;
using Pfz.Imaging;
using Pfz.Imaging.Wpf;

namespace PfzImagingSample
{
    public sealed class MultiplePngRgb24FrameWriter:
        FrameWriter<Rgb24>
    {
        private string _directory;
        public MultiplePngRgb24FrameWriter(string directory)
        {
            if (directory == null)
                throw new ArgumentNullException("directory");

            _directory = directory + "\\";
            Directory.CreateDirectory(directory);
        }

        private int _count;
        public override bool ChangesContent
        {
            get
            {
                return false;
            }
        }

        public override bool RequiresDifferentBitmap
        {
            get
            {
                return false;
            }
        }

        public override void WriteFrame(ManagedBitmap<Rgb24> bitmap)
        {
            if (bitmap == null)
                throw new ArgumentNullException("bitmap");

            int width = bitmap.Width;
            int height = bitmap.Height;
            var writeableBitmap = new WriteableBitmap(width, height, 96, 96, PixelFormats.Rgb24, null);
            bitmap.CopyTo(writeableBitmap);

            var encoder = new PngBitmapEncoder();
            var frame = BitmapFrame.Create(writeableBitmap);
            encoder.Frames.Add(frame);

            _count++;
            using(var stream = File.Create(_directory + _count + ".png"))
                encoder.Save(stream);
        }
    }
}

Ok, it works and if instead of saving the files to disk I simple wrote all of them to a a single Stream I could send videos frame-by-frame over TCP/IP.

But if I simple wanted to send each frame as an entire bitmap I surely didn't need to use my managed bitmaps and this implementation is WPF specific. Unfortunately, I don't know all the inner details to recreate the PNG compression without using other references, but this is only a sample and is not our main goal.

So... let's continue.

Making the animation work as a frame difference format

The entire purpose of having the ManagedBitmap class is to have an easy way to access and manipulate bitmap pixels, and animation is a great place to do that.

Without other compression techniques, one of the simplest compression logics that exist is to detect which area has been changed.

For example, if I have an entire black frame

and the next frame simple has a small text on it

I can detect that the changed area is this

So, that's the first thing we can do to create our own video format.

I added the method GetDifferenceArea on the class. The method looks like this:

public DifferenceArea GetDifferenceArea<TOther>(int thisLeft, int thisTop, ManagedBitmap<TOther> otherBitmap, int otherLeft, int otherTop, int width, int height, Func<TOther, T, bool> arePixelsSimilarDelegate)
{
  int thisWidth = _width;
  if (thisLeft < 0 || thisLeft >= thisWidth)
    throw new ArgumentOutOfRangeException("thisLeft");

  int thisHeight = Height;
  if (thisTop < 0 || thisTop >= thisHeight)
    throw new ArgumentOutOfRangeException("thisTop");

  if (otherBitmap == null)
    throw new ArgumentNullException("otherBitmap");

  int otherWidth = otherBitmap._width;
  if (otherLeft < 0 || otherLeft >= otherWidth)
    throw new ArgumentOutOfRangeException("otherLeft");

  int otherHeight = otherBitmap.Height;
  if (otherTop < 0 || otherTop >= otherHeight)
    throw new ArgumentOutOfRangeException("otherTop");

  int thisRight = thisLeft + width;
  int otherRight = otherLeft + width;
  if (width < 0 || thisRight > width || otherRight > width)
    throw new ArgumentOutOfRangeException("width");

  int thisBottom = thisTop + height;
  int otherBottom = otherTop + height;
  if (height < 0 || thisBottom > height || otherBottom > height)
    throw new ArgumentOutOfRangeException("height");

  if (arePixelsSimilarDelegate == null)
    throw new ArgumentNullException("arePixelsSimilarDelegate");

  int thisIndex = thisTop * thisWidth + thisLeft;
  int otherIndex = otherTop * otherWidth + otherLeft;
  int thisAdd = thisWidth-width;
  int otherAdd = otherWidth-width;

  int minX = int.MaxValue;
  int maxX = int.MinValue;
  int minY = int.MaxValue;
  int maxY = int.MinValue;

  var thisPixels = _pixelArray;
  var otherPixels = otherBitmap._pixelArray;

  for(int y=0; y<height; y++)
  {
    for(int x=0; x<width; x++)
    {
      T thisPixel = thisPixels[thisIndex];
      TOther otherPixel = otherPixels[otherIndex];

      bool arePixelsSimilar = arePixelsSimilarDelegate(otherPixel, thisPixel);
      if (!arePixelsSimilar)
      {
        if (x < minX)
          minX = x;

        if (x > maxX)
          maxX = x;

        if (y < minY)
          minY = y;

        maxY = y;
      }

      thisIndex++;
      otherIndex++;
    }

    thisIndex += thisAdd;
    otherIndex += otherAdd;
  }

  if (maxX == int.MinValue)
    return null;

  return new DifferenceArea(minX, minY, maxX, maxY);
}

The code is complex because I wanted to cover all the possibilities and the method allows to get the difference area even if the two bitmaps have different pixel types, and at specific locations which can be different for each bitmap. But at this moment we are being minimalist so you will probably want to see this overload:

public DifferenceArea GetDifferenceArea(ManagedBitmap<T> otherBitmap)
{
  var result = GetDifferenceArea(otherBitmap, null);
  return result;
}

With it, we give a second bitmap to compare with the actual one. There is nothing complicated in that.

If there is no difference, null is returned. If there is a difference, a result of type DifferenceArea is returned, which contains the offsets of the different area.

So, without any other compression, I am already capable of identifying that the change from the first bitmap to the second (both of 300x225 pixels) is an area of only 113x13 pixels. It is easy to note that it will be smaller, right?

Compressing even more

Surely we can compress even more. If we use a DeflateStream when writing our bitmap data, we will have a normal byte compression that will be applied over our reduced bitmap.

In my first sample there was no background (it was simple black) and that really allows a DelateStream to compress a lot, but what if our images have a background, like in this case:

And the second image is like this:

Our difference image can look like this:

Or like this:

Without another compression technique, both will have the same size. But it is not hard to imagine that for a byte compression algorithm, like Deflate, the second one is easier to compress.

Implementation note: I don't know in which block sizes DeflateStream works but I initially was writing byte-by-byte, expecting that it will compress only when "flushing", but the result was becoming bigger than the non-compressed image, so I decided to first write everything to a buffer and then write the entire buffer to the DeflateStream and then it worked fine.

Ok, the actual result is not quite that. I subtract each color element from one image to another. That makes equally colored pixels become black, but different pixels will not have their original solid color, it will be a color that summed with the old value will gives the right color. Also, when debugging video frames I wanted to see the intermediate images and I added 127 to each color element, so darker places become dark gray, identical pixels become gray and lighter pixels became light gray... and I was surprised that DeflateStream was able to compress more, so I kept that in the final algorithm.

The generated difference image is this (I zoomed it to make it easier to understand):

Lossy compression

Until this moment I was presenting the lossless compression. That is, at the end, the first image with the applied difference is identical to the second image. But to achieve a better compression we may want to lose quality, effectively making the animation easier to be compressed and faster to send over the network.

I will not show every possible way of lossy compression, but I will show a simple one. Color loss.

It is known that human beings don't see all the colors 16 millions colors the computer generates. In fact, even if we do see all of them, many monitor configurations can already change colors, so usually small color losses don't affect the result.

For example, take a look at those two squares.

and

One uses the color as 255, 255 and 255, while the other uses the color as 254, 254 and 254.

Did you noticed the difference?

If you answered yes, then congratulations. Most people will simple not see any difference and, even if you see a difference, do you think it will make any difference inside a photo?

By reducing the number of colors an image have you will also help the Deflate algorithm work better. There are a lot of algorithms that can be used to reduce the number of colors of an image (some of which are capable of choosing the most appropriate intermediate color).

But the algorithm I will use is extremely simple. Each color element is divided by a value (and as integers they will lose the possible floating part) and then multiplied again by the same value. The bigger the value used in the division and multiplication, the more colors are lost (or joined). Extremely simple, but it is enough to complete our sample.

The code to do the color reduction is this:

public static PixelManipulator<Rgb24> Get(int factor)
{
  if (factor < 2)
    throw new ArgumentOutOfRangeException("factor", "factor must be at least 2.");

  return
  (ref Rgb24 pixel) =>
  {
    int r = pixel.Red/factor;
    r *= factor;

    int g = pixel.Green/factor;
    g *= factor;

    int b = pixel.Blue/factor;
    b *= factor;
    pixel = new Rgb24((byte)r, (byte)g, (byte)b);
  };
}

Such methods returns a delegate of type PixelManipulator, which can then be applied to the ManagedBitmap by callig the method ManipulateAllPixels(). To show it working, see the results for the frame 46 of the sample saved in .png after the color loss:

Lossless (127kb)	1/2 color ranges (113kb)

1/3 color ranges (98kb)	1/16 color ranges (48kb)

Video Improvements

To make a real video format I suggest that the actual interfaces are not changed. New interfaces (IVideoEncoder, IVideoDecoder) should be created, based on the actual interfaces and adding methods to read and write extra information (like frame-rate, how many time to wait for the next frame and so on);

Doing better

Surely it is possible to do better. There are a lot of algorithms to encode video, capable of determining which areas (instead of which area) in the image have changed and even different lossy algorithms that could do great compressions.

At the moment I am only showing a sample of how it could be done. I can say that PNG still compress better than my format for single images, so I am pretty sure that if I save the differences as PNG images I will get a better compression.

In fact, I sincerely hope my interfaces and classes can be used as the base to implement industry-standard formats to encode videos using only managed APIs.

The Sample

The sample is composed of two applications. One has a basic animation running and allows to save it using multiple PNG images or using my algorithm of modification detection + DeflateStream.

The other application is capable of playing the saved animations. You must drag and drop the file or folder created by the other application to see the animation.

The samples don't have exception handling and even if the AnimationWriter application only saves a single animation, the classes are there to be used to your own animations. Also, you can check the different results of the lossy algorithm with it (I got intrigued that the biggest color loss creates bigger files than the 1/15 loss on the deflate algorithm).

A Few Notes

When writing the article I did an animation in javascript similar to the one done in the sample. Unfortunately, codeproject is not running my javascript, so I removed the animation. I tried to make the animation in GIF, but it was getting huge (4mb... in my own format with the highest quality it gets 429kb... but gif was also losing colors).

So, I really believe my format is doing a good job... but I am frustrated that I can't put my animation on the article.

Version History

February, 6th - Corrected the reference of Windows Forms and WPF to be System.Drawing.dll and PresentationCore.dll
February, 4th - Created the IPixelFormat interface, made the Rgb24 type implement it and also created the Bgr24 and Bgra32 types. Made the DeflateFrameReader and DeflateFrameWriter generic.
February, 1st - First version.