Click here to Skip to main content
15,921,279 members
Articles / Artificial Intelligence / Deep Learning

Extending Dataset with Data Augmentation

Rate me:
Please Sign up or sign in to vote.
5.00/5 (3 votes)
21 Dec 2020CPOL4 min read 6.7K   73   2  
In this article, we’ll see how the same result can be achieved by data augmentation.
Here we'll provide a very short introduction to the data augmentation technique. Then we select appropriate methods of augmentation suitable for our dataset. Then we provide and explain the Python code for implementing the augmentation algorithms. And finally, we demonstrate the resulting image examples.


Unruly wildlife can be a pain for businesses and homeowners alike. Animals like deer, moose, and even cats can cause damage to gardens, crops, and property.

In this article series, we’ll demonstrate how to detect pests (such as a moose) in real time (or near-real time) on a Raspberry Pi and then take action to get rid of the pest. Since we don’t want to cause any harm, we’ll focus on scaring the pest away by playing a loud noise.

You are welcome to download the source code of the project. We are assuming that you are familiar with Python and have a basic understanding of how neural networks work.

In the previous article, we prepared the initial version of our moose classification dataset. It included 200 samples for the moose class and 284 samples for the background class. Rather small, right? In this article, we’ll demonstrate how to enhance our dataset without gathering new images.

Data Augmentation

Data augmentation provides a way to derive new samples from existing images using various image modifications. The most common of these are geometric transformations, color space transformations, data noising, and image filtering. When applying data augmentation algorithms, we should ensure that the image modification is suitable for our dataset, and that it does not harm the samples.

Let’s consider our moose dataset. Which of the modification types would work for it?

Color space transformations and data noising won’t do. Color is an important feature for animal classification, so we should not change the color space of our sample images. Data noising is an effective technique for datasets containing handwritten digits or symbols. We don’t expect this to be useful for images of moose in their natural habitat.

This leaves us with geometric transformations and image filtering. We’ll consider these in detail once while we implement them in Python.

Geometric Transformations

Quite a few geometric transformations could be used for image data augmentation: Flipping, rotation, zooming, cropping, translation, and so on.

An obvious and very useful transformation is the horizontal flip, which mirrors an image relative to the vertical axis.

Another popular geometric transformation is rotation, which emulates viewing the sample from the different angles. Note that this transformation should apply only modest rotation angles to preserve the object shape.

We can easily implement geometric transformations using the Python OpenCV package:

class FlipProcessor:
    def name(self):
        return "flip"
    def process(self, image):
        return cv2.flip(image, 1)
class RotateProcessor:
    def __init__(self, angle, scale):
        self.angle = angle
        self.scale = scale
    def name(self):
        if self.angle>0:
            sa = str(self.angle)
            sa = "_"+str(-self.angle)
        return "rotate"+sa
    def process(self, image):
        (h, w, c) = image.shape
        center = (w/2, h/2)

        rmat = cv2.getRotationMatrix2D(center, self.angle, self.scale)
        rotated = cv2.warpAffine(image, rmat, (w, h))

        return rotated

Image Filtering

Image filtering is a useful but unintuitive type of data augmentation. It’s hard to estimate which filters will produce the most useful data sample from an existing image.

Our dataset consists of frames extracted from a video of wildlife, so it is fair to assume that some frames are clear and some are blurred. So sharpening and smoothing the frames can improve and diversify our dataset. Here is how this looks in Python code:

class SmoothProcessor:
    def __init__(self, filter_size):
        self.filter_size = filter_size
    def name(self):
        return "smooth"+str(self.filter_size)
    def process(self, image):
        smoothed = cv2.GaussianBlur(image,(self.filter_size, self.filter_size),0)

        return smoothed

class SharpenProcessor:
    def __init__(self, filter_size):
        self.filter_size = filter_size
    def name(self):
        return "sharpen"+str(self.filter_size)
    def process(self, image):
        sharpen = cv2.bilateralFilter(image, self.filter_size, 150, 150);

        return sharpen

One more transformation we need just copies the original image (we’ll explain the need for this later):

class OriginalProcessor:        
    def name(self):
        return ""
    def process(self, image):
        return image

And Now All the Tricks in a Single Algorithm

Now we have all the transformations we’re going to use. Here is the algorithm that will automatically pull all the images we have through the transformations:

class Augmentor:
    def __init__(self, processors):
        self.processors = processors

    def generate(self, source, dest):
        files = FileUtils.get_files(source)
        if not os.path.isdir(dest):
        for (i, fname) in enumerate(files):
            img = cv2.imread(fname)
            for (j, proc) in enumerate(self.processors):
                p_img = proc.process(img)
                f = os.path.basename(fname)
                f = os.path.splitext(f)[0]
                f = f + + ".png"
                dfname = os.path.join(dest, f)
                cv2.imwrite(dfname, p_img)

The generate method iterates over the files in the source folder, applies all the specified transformations to them, and saves the modified images to the destination directory with the new name (the original name with the processor name concatenated).

We can apply the augmentation algorithm to our sample images with the following code:

folder = r"C:\PI_PEST\moose"
gen_folder = r"C:\PI_PEST\moose_gen"

processors = [OriginalProcessor(), FlipProcessor()]
gen = Augmentor(processors)
gen.generate(folder, gen_folder)

processors = [SmoothProcessor(5), SharpenProcessor(5)]
gen = Augmentor(processors)
gen.generate(gen_folder, gen_folder)

processors = [RotateProcessor(15, 1.2), RotateProcessor(-15, 1.2)]
gen = Augmentator(processors)
gen.generate(gen_folder, gen_folder)

Note how we implemented augmentation in three sequential steps.

First, we applied the flip and the "original" processor. Now we see why OriginalProcessor just copies the original image as-is: After this step, we want copies of both the original and flipped images into the moose_gen folder.

In the second step, we applied the smoothing and sharpening processors to all the images generated in the first step.

In the last step, we applied two transformations (rotation by 15 degrees clockwise and counterclockwise) to all the images generated in the two previous steps. The resulting images for a sample are shown below:

Image 1

Next Steps

How many samples do we have now? We have 18 times more!

We have 3600 samples for the moose class and 5112 samples for the background class. For our purposes, a data set of this size will provide acceptable moose detection results.

But if you were developing a commercial pest detector, you’d want a much larger data set. A common theme you’ll encounter when working on AI projects is that acquiring good data is the most difficult task.

There are well-known, battle-tested DNN architectures for many common types of problems, but acquiring a large set of images, cleaning them, and augmenting them can take days, weeks, or even months depending on the size of the problem you are solving.

In the next article, we’ll discuss training our DNN classifier with the augmented dataset.

This article is part of the series 'Real-time AI Pest Elimination on Edge Devices View All


This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Written By
Team Leader VIPAKS
Russian Federation Russian Federation

Master’s degree in Mechanics.

PhD degree in Mathematics and Physics.


15 years’ experience in developing scientific programs
(C#, C++, Delphi, Java, Fortran).


Mathematical modeling, symbolic computer algebra, numerical methods, 3D geometry modeling, artificial intelligence, differential equations, boundary value problems.

Comments and Discussions

-- There are no messages in this forum --