Everything you need to know about Min-Max normalization: A Python tutorial

 

Introduction

This is my second post about the normalization techniques that are often used prior to machine learning (ML) model fitting. In my first post, I covered the Standardization technique using scikit-learn’s StandardScaler function. If you are not familiar with the standardization technique, you can learn the essentials in only 3 min by clicking here.

In the present post, I will explain the second most famous normalization method i.e. Min-Max Scaling using scikit-learn (function name: MinMaxScaler ).

Core of the method

Another way to normalize the input features/variables (apart from the standardization that scales the features so that they have μ=0and σ=1) is the Min-Max scaler. By doing so, all features will be transformed into the range [0,1] meaning that the minimum and maximum value of a feature/variable is going to be 0 and 1, respectively.

Why to normalize prior to model fitting?

The main idea behind normalization/standardization is always the same. Variables that are measured at different scales do not contribute equally to the model fitting & model learned function and might end up creating a bias. Thus, to deal with this potential problem feature-wise normalization such as MinMax Scaling is usually used prior to model fitting.

This can be very useful for some ML models like the Multi-layer Perceptrons (MLP), where the back-propagation can be more stable and even faster when input features are min-max scaled (or in general scaled) compared to using the original unscaled data.

Note: Tree-based models are usually not dependent on scaling, but non-tree models models such as SVM, LDA etc. are often hugely dependent on it.

The mathematical formulation

The mathematical formulation for the min-max scaling. Image created by the author. Here, x represents a single feature/variable vector.

Python working example

Here we will use the famous iris dataset that is available through scikit-learn.

Reminder: scikit-learn functions expect as input a numpy array X with dimension [samples, features/variables] .

from sklearn.datasets import load_iris
from sklearn.preprocessing import MinMaxScaler
import numpy as np
# use the iris dataset
X, y = load_iris(return_X_y=True)
print(X.shape)
# (150, 4) # 150 samples (rows) with 4 features/variables (columns)
# build the scaler model
scaler = MinMaxScaler()
# fit using the train set
scaler.fit(X)
# transform the test test
X_scaled = scaler.transform(X)
# Verify minimum value of all features
X_scaled.min(axis=0)
# array([0., 0., 0., 0.])
# Verify maximum value of all features
X_scaled.max(axis=0)
# array([1., 1., 1., 1.])
# Manually normalise without using scikit-learn
X_manual_scaled = (X — X.min(axis=0)) / (X.max(axis=0) — X.min(axis=0))
# Verify manually VS scikit-learn estimation
print(np.allclose(X_scaled, X_manual_scaled))
#True

The effect of the transform in a visual example

import matplotlib.pyplot as pltfig, axes = plt.subplots(1,2)axes[0].scatter(X[:,0], X[:,1], c=y)
axes[0].set_title("Original data")
axes[1].scatter(X_scaled[:,0], X_scaled[:,1], c=y)
axes[1].set_title("MinMax scaled data")
plt.show()

It is obvious that the values of the features are within the range [0,1] following the Min-Max scaling (right plot).

Another visual example from scikit-learn website

The Min Max scaling effect. Figure taken from scikit-learn documentation: https://scikit-learn.org/stable/auto_examples/preprocessing/plot_all_scaling.html

Summary

  • One important thing to keep in mind when using the MinMax Scaling is that it is highly influenced by the maximum and minimum values in our data so if our data contains outliers it is going to be biased.
  • MinMaxScaler rescales the data set such that all feature values are in the range [0, 1]. This is done feature-wise in an independent way.
  • The MinMaxScaler scaling might compress all inliers in a narrow range.

How to deal with outliers

  • Manual way (not recommended): Visually inspect the data and remove outliers using outlier removal statistical methods.
  • Recommended way: Use the RobustScaler that will just scale the features but in this case using statistics that are robust to outliers. This scaler removes the median and scales the data according to the quantile range (defaults to IQR: Interquartile Range). The IQR is the range between the 1st quartile (25th quantile) and the 3rd quartile (75th quantile).

That’s all for today! Hope you liked this first post! Next story coming next week. Stay tuned & safe.

- My mailing list in just 5 seconds: https://seralouk.medium.com/subscribe

- Become a member and support me:https://seralouk.medium.com/membership

Stay tuned & support me

If you liked and found this article useful, follow me and applaud my story to support me!

Resources

See all scikit-learn normalization methods side-by-side here: https://scikit-learn.org/stable/auto_examples/preprocessing/plot_all_scaling.html

Research Scientist at University of Geneva & University Hospital of Bern. PhD, MSc, M.Eng.

Sign up for The Variable

By Towards Data Science

Every Thursday, the Variable delivers the very best of Towards Data Science: from hands-on tutorials and cutting-edge research to original features you don't want to miss. Take a look.

Your home for data science. A Medium publication sharing concepts, ideas and codes.

Save your friends from dependency hell

At the end of my first year at as as software engineer at Amazon, my manager taught me a valuable lesson. Multiplying the effectiveness of my team is as important as my individual contributions.

I took this lesson with me to the modelling team at Improbable, where I focussed on…

Share your ideas with millions of readers.

Reading and understanding research papers is like piecing together an unsolved puzzle. Photo by Hans-Peter Gauster on Unsplash.

INSIDE AI NLP365

NLP Papers Summary is a series where I summarise the key takeaways of NLP research papers

Project #NLP365 (+1) is where I document my NLP learning journey every single day in 2020. Feel free to check out what I have been learning over the last 305 days here. At the end of this article, you can find previous papers summary grouped by NLP areas and you…

A detailed step-by-step guide to creating a GIF with big data in Python using Dask and Datashader libraries.

Construction permits filed each year from 1989–2019 in New York City. Image by author.

Hello! This article will walk you through step-by-step to creating a GIF to visualise big data in Python. GIFs are a great way to show changes over time, especially with large datasets. This article uses the New York City construction permits from the past 30 years, published by the Department…

Speed up development and testing of structured streaming pipelines using HTTP REST endpoint as a streaming source.

Photo by Michael Dziedzic on Unsplash

Writing distributed applications could be a time-consuming process. While running simple spark.range( 0, 10 ).reduce( _ + _ ) ( A “Hello World” example of Spark ) code on your local machine is easy enough, it eventually gets complicated as you come across more complex real-world use cases, especially in…

Medium-Unlimited
Finding Medium-Unlimited extension helpful?
Please help spread the word!

Comments

Popular posts from this blog

How and why to Standardize your data: A python tutorial

Time Series with Zillow’s Luminaire — Part I Data Exploration