Z-score for outlier detection

Neetika Yadav
1 min readMar 7, 2021

The values that do not follow a pattern and diverge from the data set can be regarded as an outlier. It’s interesting to narrow down the outlier present in our data to either mitigate them or study them carefully as they can be of great statistical and business importance for the case.

Photo by Dan Meyers at Unsplash

Z-score which is also known as a standard score that gives a statistical test of how much a value deviates from the mean, Z score tells how many standard deviations away a data point is from the mean.

Z-score can be calculated as:

Z score = (x -mean) / std. deviation

The threshold set for the z-score calculation is 3 units. If the z-score value is greater than 3 or less than -3 then the data point can be regarded as an outlier.

Let us consider a set of data points:

399,114,737,677,438,806,231,607,880,550,374,748,342,985,853,187,762,953,914,453,2010,2179,3800

The mean for these points is 869.52 whereas the standard deviation is 792.10. After the z-score calculation for each individual point we came across that only 3800 has a value greater than 3 which is 3.69 and hence is one of the outlier in our present dataset.

The z-score values can be both negative and positive.

--

--