Outlier detection using IQR method

Neetika Yadav
2 min readFeb 28, 2021

--

Observation deviating from the existing trend of the data is known as an outlier, the outlier can be a human error, a measurement lapse, or an anomaly. Not all outliers are bad but some can exhibit very interesting patterns in the dataset just like very high or low scoring students in the class or bank fraud patterns to catch fraudulent behaviors. In this article, we will discuss a very significant metric to detect outliers in any dataset, IQR (Interquartile Range).

By Will Myers on Unsplash

IQR stands for a measure to divide the data into ranges of quartiles

A thumb rule says that a data point is an outlier if it is 1.5 times more than the 3rd quartile or the same below the first quartile. (Q1–1.5*IQR or Q3 + 1.5*IQR)

Let us consider the set of data points:

5,7,10,15,19,21,21,22,22,23,23,23,23,23,24,24,24,24,25

The Q1 value or the first quartile is 20 that can be calculated in excel using the QUARTILE function.

The Q3 value or the third quartile is: 23.5

Hence the IQR range shall be 23.5–20 = 3.5

Now let’s calculate the Q1–1.5*IQR range which shall be 20–3.5 = 16.5

Similarly, Q3+1.5*IQR range shall be 23.5+3.5 = 27

Any point in our dataset that is lower than 16.5 or greater than 27 can be deemed as an outlier. Hence the datapoints 5,7,10,15 are outliers in our dataset. The box plot uses IQR to display visuals marking the outliers, IQR is very similar to a z-score in finding the distribution of data and finding outliers.

Whether an outlier should be removed or scaled is subjective to the business objective and shall be catered differently each time.

--

--

Neetika Yadav
Neetika Yadav

Written by Neetika Yadav

Works at EY — Risk analysis

No responses yet