Data augmentation sample. credits: My pet

Data Augmentation under 2 min with Tensorflow

Edward Ortiz

--

A baseline approach to automate data augmentation using Tensorflow

In Machine Learning the need to work with a large amount of data to create the most reliable experiments by training a meaningful quantity of samples is a common practice in the industry. However, there are situations when the user does not have enough data to create a good ML model, so there is when certain techniques come to a place to expand the size of a training data set.

For example, let’s imagine that a Deep Neural Network is a car that needs to get to a certain destination, and to get this done, you need to add enough gas to the car. This gas can be seen as the amount of data required for the model. Then as you can see there is an analogous relation between the model and the amount of data so the model can perform well.

The amount of data required is proportional to the number of learnable parameters in the model

Data Augmentation is one technique in machine learning that apply different transformations on the available data to synthesize new data, thus expanding the amount of data so the model can work properly. Taking the example of a Deep Neural Network, the model trained to achieve high performance on complex tasks generally have a large number of hidden neurons. As the number of hidden neurons increases, the number of trainable parameters also increases.

Taking the example of an image classification application, the performance of a model without augmentation can be 57%, but when implementing data augmentation, the performance can increase up to 78%. Data augmentation can be applied using regular transformation or GANs techniques and can be applied not only to image classification but to text classification as well.

With Tensorflow 2.3.0, data augmentation can be an automated process that can be achieved by doing the following steps:

Step 1

Import tensorflow and optional tensorflow_datasets to play with random data

import tensorflow as tf
import tensorflow_datasets as tfds
import matplotlib.pyplot as plt

Step 2

Load the dataset

doggies = tfds.load('stanford_dogs', split='train', as_supervised=True)

Step 3

Use the module tf.image + the modification you want. Check more in the tf documentation

def flip_image(image):   
"""flip image"""
flip = tf.image.flip_left_right(image)
return flip

Final remarks

Simple as that you can create augmented data and expand the size of your dataset if working with images. I encourage you to check the module tf.image and use all the different methods allowed so next time you don’t see that augmented data technique is a difficult thing to handle

--

--

Edward Ortiz

30 years of innovation, inspiration, fascination. -All rights reserved- #whatisyourstory