Modularity from Lazy Evaluation - Performance Testing

This is the second example of how higher-order functions and lazy evaluation can reduce complexity and lead to more modular software.

Background

To performance test code a number of iterations have to be performed so a more stable average can be compared. The number of iterations is usually an input and chosen arbitrarily.

This post covers how statistical tests can be used to remove the need for this input. This simplifies performance testing and makes the results more robust and useful.

Statistics

Lazy evaluation can be used to produce an online sample statistics sequence.

The standard error of a sample mean is given by

\[SE_{\bar{x}} = \frac{s}{\sqrt{n}}\]

The statistics sequence can be iterated until a given mean standard error level of accuracy is achieved.

Welch's t-test for comparing the mean of two samples is given by

\[t = \frac{\bar{x}_1-\bar{x}_2}{\sqrt{f_1+f_2}}\]

\[df = \frac{\left(f_1+f_2\right)^2}{\frac{f_1^2}{n_1-1}+\frac{f_2^2}{n_2-1}}\]

\[f_i = \frac{s_i^2}{n_i}\]

where \(n\) is the sample size, \(\bar{x}\) is the sample mean, \(s^2\) is the sample variance, \(t\) is Welch's t statistic, and \(df\) is the degrees of freedom of the statistic.

Welch's t statistic can be compared to the inverse Student's t-distribution for a given confidence level to test if the sample means are different.

Two sample statistics sequences can be iterated together and compared to decide if one mean is larger than the other.

Statistics code

type SampleStatistics = {N:int;Mean:float;Variance:float}
                        member s.StandardDeviation = sqrt s.Variance
                        member s.MeanStandardError = sqrt(s.Variance/float s.N)

type WelchStatistic = {T:float;DF:int}

module Statistics =
    /// Online statistics sequence for a given sample sequence.
    let inline sampleStatistics s =
        let calc (n,m,s) x =
            let m'=m+(x-m)/float(n+1)
            n+1,m',s+(x-m)*(x-m')
        Seq.map float s |> Seq.scan calc (0,0.0,0.0) |> Seq.skip 3
        |> Seq.map (fun (n,m,s) -> {N=n;Mean=m;Variance=s/float(n-1)})

    /// Scale the statistics for a given underlying random variable change of scale.
    let scale f s = {s with Mean=s.Mean*f;Variance=s.Variance*sqr f}

    /// Single iteration statistics for a given iteration count and total statistics.
    let singleIteration ic s =
        {N=s.N*ic;Mean=s.Mean/float ic;Variance=s.Variance/float ic}

    /// Student's t-distribution inverse for 0.1% confidence level by
    /// degrees of freedom.
    let private tInv = 
        [|636.6;31.60;12.92;8.610;6.869;5.959;5.408;5.041;4.781;4.587;4.437;4.318;4.221;
          4.140;4.073;4.015;3.965;3.922;3.883;3.850;3.819;3.792;3.768;3.745;3.725;3.707;
          3.690;3.674;3.659;3.646;3.633;3.622;3.611;3.601;3.591;3.582;3.574;3.566;3.558;
          3.551;3.544;3.538;3.532;3.526;3.520;3.515;3.510;3.505;3.500;3.496;3.492;3.488;
          3.484;3.480;3.476;3.473;3.470;3.466;3.463;3.460;3.457;3.454;3.452;3.449;3.447;
          3.444;3.442;3.439;3.437;3.435;3.433;3.431;3.429;3.427;3.425;3.423;3.421;3.420;
          3.418;3.416;3.415;3.413;3.412;3.410;3.409;3.407;3.406;3.405;3.403;3.402;3.401;
          3.399;3.398;3.397;3.396;3.395;3.394;3.393;3.392;3.390|]

    /// Welch's t-test statistic for two given sample statistics.
    let welchStatistic s1 s2 =
        let f1 = s1.Variance/float s1.N
        let f2 = s2.Variance/float s2.N
        {
            T = (s1.Mean-s2.Mean)/sqrt(f1+f2)
            DF= if f1=0.0 && f2=0.0 then 1
                else sqr(f1+f2)/(sqr f1/float(s1.N-1)+sqr f2/float(s2.N-1)) |> int
        }

    /// Welch's t-test for a given Welch statistic to a confidence level of 0.1%.
    let welchTest w =
        if abs w.T < Array.get tInv (min w.DF (Array.length tInv) - 1) then 0
        else sign w.T

Performance testing

Three performance metrics will be created: time, memory allocated, and garbage collections.

Each of these will be repeated until at least the metric target is reached to ensure it is reliably measuring the metric and not any framework overhead.

Statistics functions will be created for each of the metrics to measure a function accurate to a mean standard error of 1%.

Compare functions will be created for each of the metrics to compare two functions to a confidence level of 0.1%. The functions will be considered equal after 10,000 degrees of freedom have past without a positive test result.

Performance testing code

module Performance =
    /// Find the iteration count to get to at least the metric target.
    let inline private targetIterationCount metric metricTarget f =
        let rec find n =
            let item = metric n f
            if (item<<<3)>=metricTarget then n*int metricTarget/int item+1
            else find (n*10)
        find 1
        
    /// Create and iterate a statistics sequence for a metric until the given accuracy.
    let inline private measureStatistics (metric,metricTarget) relativeError f =
        let ic = targetIterationCount metric metricTarget f
        Seq.initInfinite (fun _ -> metric ic f) |> sampleStatistics
        |> Seq.map (singleIteration ic)
        |> Seq.find (fun s -> s.MeanStandardError<=relativeError*s.Mean)

    /// Create and iterate two statistics sequences until the metric means can be
    /// compared.
    let inline private measureCompare (metric,metricTarget) f1 f2 =
        if f1 id<>f2 id then failwith "function results are not the same"
        let ic = targetIterationCount metric metricTarget f2
        let stats f = Seq.initInfinite (fun _ -> metric ic f)
                      |> sampleStatistics
                      |> Seq.map (singleIteration ic)
        Seq.map2 welchStatistic (stats f1) (stats f2)
        |> Seq.pick (fun w ->
            let maxDF = 10000
            if w.DF>maxDF then Some 0 else match welchTest w with |0->None |c->Some c)

    /// Measure the given function for the iteration count using the start and end
    /// metric.
    let inline private measureMetric startMetric endMetric ic f =
        GC.Collect()
        GC.WaitForPendingFinalizers()
        GC.Collect()
        let mutable total = LanguagePrimitives.GenericZero
        let measurer toMeasure =
            fun args ->
                let s = startMetric()
                let ret = toMeasure args
                let m = endMetric s
                total<-total+m
                ret
        let rec loop i = if i>0 then f measurer |> ignore; loop (i-1)
        loop ic
        total
    
    /// Measure the time metric for the given function and iteration count.
    let private timeMetric ic f = 
        measureMetric Stopwatch.GetTimestamp (fun t -> Stopwatch.GetTimestamp()-t) ic f
        
    /// Measure the memory metric for the given function and iteration count.
    let private memoryMetric ic f =
        let inline startMetric() =
            if GC.TryStartNoGCRegion(1L<<<22) |> not then failwith "TryStartNoGCRegion"
            GC.GetTotalMemory false
        let inline endMetric s =
            let t = GC.GetTotalMemory false - s
            GC.EndNoGCRegion()
            t
        measureMetric startMetric endMetric ic f

    /// Measure the garbage collection metric for the given function and iteration
    /// count.
    let private garbageMetric ic f =
        let count() = GC.CollectionCount 0 + GC.CollectionCount 1 + GC.CollectionCount 2
        measureMetric count (fun s -> count()-s) ic f

    /// Measure definitions which are a metric together with a metric target.
    let private oneMillisecond = Stopwatch.Frequency/1000L
    let private timeMeasure = timeMetric, oneMillisecond
    let private memoryMeasure = memoryMetric, 1024L //1KB
    let private garbageMeasure = garbageMetric, 10

    /// Time statistics for a given function accurate to a mean standard error of 1%.
    let timeStatistics f =
        measureStatistics timeMeasure 0.01 f |> scale (1.0/float Stopwatch.Frequency)
    /// Memory statistics for a given function accurate to a mean standard error of 1%.
    let memoryStatistics f = measureStatistics memoryMeasure 0.01 f
    /// GC count statistics for a given function accurate to a mean standard error of
    /// 1%.
    let gcCountStatistics f = measureStatistics garbageMeasure 0.01 f
    
    /// Time comparison for two given functions to a confidence level of 0.1%.
    let timeCompare f1 f2 = measureCompare timeMeasure f1 f2
    /// Memory comparison for two given functions to a confidence level of 0.1%.
    let memoryCompare f1 f2 = measureCompare memoryMeasure f1 f2
    /// GC count comparison for two given functions to a confidence level of 0.1%.
    let gcCountCompare f1 f2 = measureCompare garbageMeasure f1 f2

Conclusion

The performance testing functions have a very simple signature. The statistics functions just take the function to be measured. The compare functions just take the two functions to be compared.

The statistics functions give an overview of a function's performance. These can easily be combined to produce a useful performance report.

The compare functions can be used in unit tests since they are a relative test and hence should be independent of the machine. They are also fast since they stop as soon as the given confidence level is achieved. The compare functions could also be extended to test if a function is a given percentage better than another.

Modularity from higher-order functions and lazy evaluation together with a little maths have produced a simple yet powerful performance testing library.

UPDATED (2016-10-21): The functions have been extended to be able to measure a subfunction of the passed in functions. This enables set up to be done without effecting the measurement. The subfunction can be called multiple times and the metric will be aggregated. An example of its use can be found in the Outliers and MAD post.

UPDATED (2017-01-03): Performance testing functionality has been added to the F# testing library Expecto using the code in this post.

namespace System
namespace System.Diagnostics
Multiple items
type AutoOpenAttribute =
  inherit Attribute
  new : unit -> AutoOpenAttribute
  new : path:string -> AutoOpenAttribute
  member Path : string

--------------------
new : unit -> AutoOpenAttribute
new : path:string -> AutoOpenAttribute
val sqr : x:'a -> 'b (requires member ( * ))
val x : 'a (requires member ( * ))
SampleStatistics.N: int
Multiple items
val int : value:'T -> int (requires member op_Explicit)

--------------------
type int = int32

--------------------
type int<'Measure> = int
SampleStatistics.Mean: float
Multiple items
val float : value:'T -> float (requires member op_Explicit)

--------------------
type float = Double

--------------------
type float<'Measure> = float
SampleStatistics.Variance: float
val s : SampleStatistics
val sqrt : value:'T -> 'U (requires member Sqrt)
type WelchStatistic =
  {T: float;
   DF: int;}
WelchStatistic.T: float
WelchStatistic.DF: int
val sampleStatistics : s:seq<'a> -> seq<SampleStatistics> (requires member op_Explicit)


 Online statistics sequence for a given sample sequence.
val s : seq<'a> (requires member op_Explicit)
val calc : (int * float * float -> float -> int * float * float)
val n : int
val m : float
val s : float
val x : float
val m' : float
module Seq

from Microsoft.FSharp.Collections
val map : mapping:('T -> 'U) -> source:seq<'T> -> seq<'U>
val scan : folder:('State -> 'T -> 'State) -> state:'State -> source:seq<'T> -> seq<'State>
val skip : count:int -> source:seq<'T> -> seq<'T>
val scale : f:float -> s:SampleStatistics -> SampleStatistics


 Scale the statistics for a given underlying random variable change of scale.
val f : float
val singleIteration : ic:int -> s:SampleStatistics -> SampleStatistics


 Single iteration statistics for a given iteration count and total statistics.
val ic : int
val private tInv : float []


 Student's t-distribution inverse for 0.1% confidence level by
 degrees of freedom.
val welchStatistic : s1:SampleStatistics -> s2:SampleStatistics -> WelchStatistic


 Welch's t-test statistic for two given sample statistics.
val s1 : SampleStatistics
val s2 : SampleStatistics
val f1 : float
val f2 : float
val welchTest : w:WelchStatistic -> int


 Welch's t-test for a given Welch statistic to a confidence level of 0.1%.
val w : WelchStatistic
val abs : value:'T -> 'T (requires member Abs)
type Array =
  member Clone : unit -> obj
  member CopyTo : array:Array * index:int -> unit + 1 overload
  member GetEnumerator : unit -> IEnumerator
  member GetLength : dimension:int -> int
  member GetLongLength : dimension:int -> int64
  member GetLowerBound : dimension:int -> int
  member GetUpperBound : dimension:int -> int
  member GetValue : [<ParamArray>] indices:int[] -> obj + 7 overloads
  member Initialize : unit -> unit
  member IsFixedSize : bool
  ...
val get : array:'T [] -> index:int -> 'T
val min : e1:'T -> e2:'T -> 'T (requires comparison)
val length : array:'T [] -> int
val sign : value:'T -> int (requires member get_Sign)
module Statistics

from Main
module Performance

from Main
val private targetIterationCount : metric:(int -> 'a -> 'b) -> metricTarget:'b -> f:'a -> int (requires member ( <<< ) and comparison and member op_Explicit)


 Find the iteration count to get to at least the metric target.
val metric : (int -> 'a -> 'b) (requires member ( <<< ) and comparison and member op_Explicit)
val metricTarget : 'b (requires member ( <<< ) and comparison and member op_Explicit)
val f : 'a
val find : (int -> int)
val item : 'b (requires member ( <<< ) and comparison and member op_Explicit)
val private measureStatistics : metric:(int -> 'a -> 'b) * metricTarget:'b -> relativeError:float -> f:'a -> SampleStatistics (requires member ( <<< ) and comparison and member op_Explicit and member op_Explicit)


 Create and iterate a statistics sequence for a metric until the given accuracy.
val metric : (int -> 'a -> 'b) (requires member ( <<< ) and comparison and member op_Explicit and member op_Explicit)
val metricTarget : 'b (requires member ( <<< ) and comparison and member op_Explicit and member op_Explicit)
val relativeError : float
val initInfinite : initializer:(int -> 'T) -> seq<'T>
val find : predicate:('T -> bool) -> source:seq<'T> -> 'T
property SampleStatistics.MeanStandardError: float
val private measureCompare : metric:(int -> (('a -> 'a) -> 'b) -> 'c) * metricTarget:'c -> f1:(('a -> 'a) -> 'b) -> f2:(('a -> 'a) -> 'b) -> int (requires equality and member ( <<< ) and comparison and member op_Explicit and member op_Explicit)


 Create and iterate two statistics sequences until the metric means can be
 compared.
val metric : (int -> (('a -> 'a) -> 'b) -> 'c) (requires equality and member ( <<< ) and comparison and member op_Explicit and member op_Explicit)
val metricTarget : 'c (requires member ( <<< ) and comparison and member op_Explicit and member op_Explicit)
val f1 : (('a -> 'a) -> 'b) (requires equality)
val f2 : (('a -> 'a) -> 'b) (requires equality)
val id : x:'T -> 'T
val failwith : message:string -> 'T
val stats : ((('a -> 'a) -> 'b) -> seq<SampleStatistics>) (requires equality)
val f : (('a -> 'a) -> 'b) (requires equality)
val map2 : mapping:('T1 -> 'T2 -> 'U) -> source1:seq<'T1> -> source2:seq<'T2> -> seq<'U>
val pick : chooser:('T -> 'U option) -> source:seq<'T> -> 'U
val maxDF : int
union case Option.Some: Value: 'T -> Option<'T>
union case Option.None: Option<'T>
val c : int
val private measureMetric : startMetric:(unit -> 'a) -> endMetric:('a -> 'b) -> ic:int -> f:((('d -> 'e) -> 'd -> 'e) -> 'f) -> 'c (requires member ( + ) and member get_Zero)


 Measure the given function for the iteration count using the start and end
 metric.
val startMetric : (unit -> 'a)
val endMetric : ('a -> 'b) (requires member ( + ) and member get_Zero)
val f : ((('d -> 'e) -> 'd -> 'e) -> 'f)
type GC =
  static member AddMemoryPressure : bytesAllocated:int64 -> unit
  static member CancelFullGCNotification : unit -> unit
  static member Collect : unit -> unit + 4 overloads
  static member CollectionCount : generation:int -> int
  static member EndNoGCRegion : unit -> unit
  static member GetGeneration : obj:obj -> int + 1 overload
  static member GetTotalMemory : forceFullCollection:bool -> int64
  static member KeepAlive : obj:obj -> unit
  static member MaxGeneration : int
  static member ReRegisterForFinalize : obj:obj -> unit
  ...
GC.Collect() : unit
GC.Collect(generation: int) : unit
GC.Collect(generation: int, mode: GCCollectionMode) : unit
GC.Collect(generation: int, mode: GCCollectionMode, blocking: bool) : unit
GC.Collect(generation: int, mode: GCCollectionMode, blocking: bool, compacting: bool) : unit
GC.WaitForPendingFinalizers() : unit
val mutable total : 'c (requires member get_Zero and member ( + ))
module LanguagePrimitives

from Microsoft.FSharp.Core
val GenericZero<'T (requires member get_Zero)> : 'T (requires member get_Zero)
val measurer : (('g -> 'h) -> 'g -> 'h)
val toMeasure : ('g -> 'h)
val args : 'g
val s : 'a
val ret : 'h
val m : 'b (requires member ( + ) and member get_Zero)
val loop : (int -> unit)
val i : int
val ignore : value:'T -> unit
val private timeMetric : ic:int -> f:((('a -> 'b) -> 'a -> 'b) -> 'c) -> int64


 Measure the time metric for the given function and iteration count.
val f : ((('a -> 'b) -> 'a -> 'b) -> 'c)
Multiple items
type Stopwatch =
  new : unit -> Stopwatch
  member Elapsed : TimeSpan
  member ElapsedMilliseconds : int64
  member ElapsedTicks : int64
  member IsRunning : bool
  member Reset : unit -> unit
  member Restart : unit -> unit
  member Start : unit -> unit
  member Stop : unit -> unit
  static val Frequency : int64
  ...

--------------------
Stopwatch() : Stopwatch
Stopwatch.GetTimestamp() : int64
val t : int64
val private memoryMetric : ic:int -> f:((('a -> 'b) -> 'a -> 'b) -> 'c) -> int64


 Measure the memory metric for the given function and iteration count.
val startMetric : (unit -> int64)
GC.TryStartNoGCRegion(totalSize: int64) : bool
GC.TryStartNoGCRegion(totalSize: int64, disallowFullBlockingGC: bool) : bool
GC.TryStartNoGCRegion(totalSize: int64, lohSize: int64) : bool
GC.TryStartNoGCRegion(totalSize: int64, lohSize: int64, disallowFullBlockingGC: bool) : bool
val not : value:bool -> bool
GC.GetTotalMemory(forceFullCollection: bool) : int64
val endMetric : (int64 -> int64)
val s : int64
GC.EndNoGCRegion() : unit
val private garbageMetric : ic:int -> f:((('a -> 'b) -> 'a -> 'b) -> 'c) -> int


 Measure the garbage collection metric for the given function and iteration
 count.
val count : (unit -> int)
GC.CollectionCount(generation: int) : int
val s : int
val private oneMillisecond : int64


 Measure definitions which are a metric together with a metric target.
field Stopwatch.Frequency: int64
val private timeMeasure : (int -> ((('a -> 'b) -> 'a -> 'b) -> 'c) -> int64) * int64
val private memoryMeasure : (int -> ((('a -> 'b) -> 'a -> 'b) -> 'c) -> int64) * int64
val private garbageMeasure : (int -> ((('a -> 'b) -> 'a -> 'b) -> 'c) -> int) * int
val timeStatistics : f:((('a -> 'b) -> 'a -> 'b) -> 'c) -> SampleStatistics


 Time statistics for a given function accurate to a mean standard error of 1%.
val memoryStatistics : f:((('a -> 'b) -> 'a -> 'b) -> 'c) -> SampleStatistics


 Memory statistics for a given function accurate to a mean standard error of 1%.
val gcCountStatistics : f:((('a -> 'b) -> 'a -> 'b) -> 'c) -> SampleStatistics


 GC count statistics for a given function accurate to a mean standard error of
 1%.
val timeCompare : f1:((('a -> 'b) -> 'a -> 'b) -> 'c) -> f2:((('a -> 'b) -> 'a -> 'b) -> 'c) -> int (requires equality)


 Time comparison for two given functions to a confidence level of 0.1%.
val f1 : ((('a -> 'b) -> 'a -> 'b) -> 'c) (requires equality)
val f2 : ((('a -> 'b) -> 'a -> 'b) -> 'c) (requires equality)
val memoryCompare : f1:((('a -> 'b) -> 'a -> 'b) -> 'c) -> f2:((('a -> 'b) -> 'a -> 'b) -> 'c) -> int (requires equality)


 Memory comparison for two given functions to a confidence level of 0.1%.
val gcCountCompare : f1:((('a -> 'b) -> 'a -> 'b) -> 'c) -> f2:((('a -> 'b) -> 'a -> 'b) -> 'c) -> int (requires equality)


 GC count comparison for two given functions to a confidence level of 0.1%.