Data Augmentation

Paulo Morillo
2 min readNov 19, 2020

Another trouble causing the overfitting is when we don’t have enough data to get a correct prediction, for example, try to think in a case about a lost member and someone shows you a picture and you see a young woman in the black and white picture, and the spouses, an old man ask you have you seen my wife? maybe you can’t laugh about this if it happens, but probably you tell him if you can see a recent picture of her. In conclusion is very difficult to identify or predict something if you don’t have enough data. Now you want to help and you are asking for more data on the age, how is the hair (long, gray, black), what is she wearing, and more. probably, if you have more data you are going to recognize the old lady.
Now how can we get more data to increase our dataset, we should think about our possible validation data and see the difference to the training data, color, shape, the orientation of the things, all things that we had wanted to get for the training data.

For example, think that we need to create a model that can be predicted if you are smiling or not. and we have only one picture with your smile. but we have 30 pictures of validation data, these pictures are with different filters, black and white, only red, green, and blue components of the picture, and maybe you put your same picture of the training but you put this in the wrong position an at this moment the picture is flipped in the vertical side.
as you know this could be difficult to recognize for this reason we should think in those cases, and increase our data (data augmentation), for this reason, we are going to add some filters (black and white, red, green, blue components) for our training picture and save them to get more training data.

Scaling, resizing, change brightness, colors, rotation are ways to increase your training dataset.

Conclusion

Data augmentation, in some cases we can add a lot of images that can be prejudicial for our prediction, we should think in possible future cases and try to no create noise in our training dataset.

--

--

Paulo Morillo

Fullstack developer and sound engineer, learning ML