Click here to Skip to main content
15,888,733 members
Articles / General Programming / Algorithms
Tip/Trick

Standard Deviation Extension for Enumerable

Rate me:
Please Sign up or sign in to vote.
4.67/5 (3 votes)
7 Jun 2013CPOL 23K   7   9
Calcution of a standard deviation and filtering outliers in a LINQ-style.

Introduction  

The following class provides two extensions to the .NET Enumerable class:

  1. Standard deviation calculation.
  2. Outlier removal using a k-sigma filter (which of course becomes a three-sigma rule for k=3).

See http://en.wikipedia.org/wiki/Three_sigma_rule for some basics. Please use the message board below to post suggestions or report bugs. Have fun!

Using the code 

Source code:

C#
using System;
using System.Collections.Generic;
using System.Linq;
public static class StandardDeviationEnumerableExtensions
{
    /// <summary>
    /// Calculates a standard deviation of elements, using a specified selector.
    /// </summary>
    public static double StandardDeviation<T>(
      this IEnumerable<T> enumerable, Func<T, double> selector)
    {
        double sum = 0;
        double average = enumerable.Average(selector);
        int N = 0;
        foreach (T item in enumerable)
        {   double diff= selector(item) - average;
            sum += diff*diff;
            N++;
        }
        return N == 0 ? 0 : Math.Sqrt(sum / N);
    }
    /// <summary>
    /// Filters elements to remove outliers. The enumeration will be
    /// selected three times, first to calculate an average, second
    /// for a standard deviation, and third to yield remiaining elements. The outliers are these
    /// elements which are further from an average than k*(standard deviation). Set k=3 for
    /// standard three-sigma rule.
    /// </summary>
    public static IEnumerable<T> SkipOutliers<T>(
       this IEnumerable<T> enumerable, double k, Func<T, double> selector)
    {
        // Duplicating a SD code to avoid calculating an average twice.
        double sum = 0;
        double average = enumerable.Average(selector);
        int N = 0;
        foreach (T item in enumerable)
        {   double diff = selector(item) - average;
            sum += diff*diff;
            N++;
        }
        double SD = N == 0 ? 0 : Math.Sqrt(sum / N);
        double delta = k * SD;
        foreach (T item in enumerable)
        {
            if (Math.Abs(selector(item) - average) <= delta)
                yield return item;
        }
    }
}

Usage:

C#
IEnumerable<double> results = new double[] { 1, 1.1, 1.2, 0.9, 2, 0.8 };
double[] filtered;
// contains all elements
filtered = results.SkipOutliers(k: 3, selector: result => result).ToArray();
// contains all elements except 2.0. That is, filtered={ 1, 1.1, 1.2, 0.9, 0.8 }
filtered = results.SkipOutliers(k: 2, selector: result => result).ToArray();
// contains just one element, 1.2, which is closest to an average. That is, filtered={ 1.2 }
filtered = results.SkipOutliers(k: 0.1, selector: result => result).ToArray();
// a singleton is always equal to it's average, so it's yielded even with k==0.
// That is, filtered={ 1.2 }
filtered = filtered.SkipOutliers(k: 0, selector: result => result).ToArray();

So, with k parameter you can adjust how strict the filtering is. If k==0, then only those elements which are equal to an average are yielded. However, do not use k==0 because doubles should not be tested for equality in this way.

History

  • 2013-06-03 -- Original version posted.
  • 2013-06-04 -- Possible unwanted division by zero bug-fix.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Software Developer
Poland Poland
My name is Jacek. Currently, I am a Java/kotlin developer. I like C# and Monthy Python's sense of humour.

Comments and Discussions

 
QuestionUsing incremental computation to avoid multiple passes Pin
Philippe Bouteleux6-Jun-13 2:02
Philippe Bouteleux6-Jun-13 2:02 
SuggestionMath.Pow(..., 2) is very inefficient Pin
Matt T Heffron4-Jun-13 16:13
professionalMatt T Heffron4-Jun-13 16:13 
GeneralRe: Math.Pow(..., 2) is very inefficient Pin
Lutosław4-Jun-13 20:53
Lutosław4-Jun-13 20:53 
NewsUpdate Pin
Lutosław4-Jun-13 0:13
Lutosław4-Jun-13 0:13 
QuestionPossible divide by zero error? Pin
George Swan3-Jun-13 21:29
mveGeorge Swan3-Jun-13 21:29 
AnswerRe: Possible divide by zero error? Pin
Lutosław3-Jun-13 23:16
Lutosław3-Jun-13 23:16 
GeneralRe: Possible divide by zero error? Pin
George Swan3-Jun-13 23:42
mveGeorge Swan3-Jun-13 23:42 
GeneralRe: Possible divide by zero error? Pin
Lutosław4-Jun-13 0:05
Lutosław4-Jun-13 0:05 
SuggestionRe: Possible divide by zero error? Pin
Paul R Benson6-Jun-13 21:01
Paul R Benson6-Jun-13 21:01 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.