Category Archives: Statistics

Emulating Excel’s NORMDIST in T-SQL

This is something that came up at work recently.  A user wanted to migrate a report from Excel to SSRS and needed to be able to implement the NORMDIST function in T-SQL.  NORMDIST takes four parameters:

  • x, a number indicating the value of interest on the normal distribution curve
  • mu, a number indicating the sample mean
  • sigma, a number indicating the sample standard deviation
  • cumulative, a boolean indicating whether to compute the cumulative value of the normal distribution curve (in other words, the integral of the normal distribution from negative infinity to x). For what I have done here, I am assuming that cumulative is TRUE.  The results for cumulative = FALSE will be completely different.

My statistics knowledge is, to say the very least, quite rusty, but with the help of a couple of Wikipedia pages (this and this), I came up with the following T-SQL that (at least for a smattering of value I tested) comes pretty close to the Excel NORMDIST function.

-- Initialize parameters
declare @x float = 1.5;
declare @mu float = 0.0;
declare @sigma float = 1.0;
-- Compute NORMDIST
declare @erfx float = (@x - @mu) / sqrt(2.0 * power(@sigma, 2.0));
declare @sign float = sign(@erfx);
select @erfx = abs(@erfx);
declare @t float = 1.0 / (1.0 + 0.3275911 * @erfx);
declare @erf float = @sign * 
	(1.0 - (0.254829592 * @t - 0.284496736 * power(@t, 2.0) +
	1.421413741 * power(@t, 3.0) - 1.453152027 * power(@t, 4.0) +
	1.061405429 * power(@t, 5.0)) * exp(-power(@erfx, 2.0)));
declare @normDist float = 0.5 * (1.0 + @erf);
select @normDist;

Of course, the usual caveats apply.  Be sure to validate this script against Excel for any values of interest, and please let me know if you find any errors in the results.

A final tip:  Tempting as it may be, don’t implement this as a user-defined function.  Scalar UDFs really are quite bad for performance.