Modularity from Lazy Evaluation - Performance Testing

This is the second example of how higher-order functions and lazy evaluation can reduce complexity and lead to more modular software.

Background

To performance test code a number of iterations have to be performed so a more stable average can be compared. The number of iterations is usually an input and chosen arbitrarily.

This post covers how statistical tests can be used to remove the need for this input. This simplifies performance testing and makes the results more robust and useful.

Statistics

Lazy evaluation can be used to produce an online sample statistics sequence.

The standard error of a sample mean is given by

\[SE_{\bar{x}} = \frac{s}{\sqrt{n}}\]

The statistics sequence can be iterated until a given mean standard error level of accuracy is achieved.

Welch's t-test for comparing the mean of two samples is given by

\[t = \frac{\bar{x}_1-\bar{x}_2}{\sqrt{f_1+f_2}}\]

\[df = \frac{\left(f_1+f_2\right)^2}{\frac{f_1^2}{n_1-1}+\frac{f_2^2}{n_2-1}}\]

\[f_i = \frac{s_i^2}{n_i}\]

where \(n\) is the sample size, \(\bar{x}\) is the sample mean, \(s^2\) is the sample variance, \(t\) is Welch's t statistic, and \(df\) is the degrees of freedom of the statistic.

Welch's t statistic can be compared to the inverse Student's t-distribution for a given confidence level to test if the sample means are different.

Two sample statistics sequences can be iterated together and compared to decide if one mean is larger than the other.

Statistics code

 1: 
 2: 
 3: 
 4: 
 5: 
 6: 
 7: 
 8: 
 9: 
10: 
11: 
12: 
13: 
14: 
15: 
16: 
17: 
18: 
19: 
20: 
21: 
22: 
23: 
24: 
25: 
26: 
27: 
28: 
29: 
30: 
31: 
32: 
33: 
34: 
35: 
36: 
37: 
38: 
39: 
40: 
41: 
42: 
43: 
44: 
45: 
type SampleStatistics = {N:int;Mean:float;Variance:float}
                        member s.StandardDeviation = sqrt s.Variance
                        member s.MeanStandardError = sqrt(s.Variance/float s.N)

type WelchStatistic = {T:float;DF:int}

module Statistics =
    /// Online statistics sequence for a given sample sequence.
    let inline sampleStatistics s =
        let calc (n,m,s) x =
            let m'=m+(x-m)/float(n+1)
            n+1,m',s+(x-m)*(x-m')
        Seq.map float s |> Seq.scan calc (0,0.0,0.0) |> Seq.skip 3
        |> Seq.map (fun (n,m,s) -> {N=n;Mean=m;Variance=s/float(n-1)})

    /// Scale the statistics for a given underlying random variable change of scale.
    let scale f s = {s with Mean=s.Mean*f;Variance=s.Variance*sqr f}

    /// Single iteration statistics for a given iteration count and total statistics.
    let singleIteration ic s = {N=s.N*ic;Mean=s.Mean/float ic;Variance=s.Variance/float ic}

    /// Student's t-distribution inverse for 0.1% confidence level by degrees of freedom.
    let private tInv = 
        [|636.6;31.60;12.92;8.610;6.869;5.959;5.408;5.041;4.781;4.587;4.437;4.318;4.221;
          4.140;4.073;4.015;3.965;3.922;3.883;3.850;3.819;3.792;3.768;3.745;3.725;3.707;
          3.690;3.674;3.659;3.646;3.633;3.622;3.611;3.601;3.591;3.582;3.574;3.566;3.558;
          3.551;3.544;3.538;3.532;3.526;3.520;3.515;3.510;3.505;3.500;3.496;3.492;3.488;
          3.484;3.480;3.476;3.473;3.470;3.466;3.463;3.460;3.457;3.454;3.452;3.449;3.447;
          3.444;3.442;3.439;3.437;3.435;3.433;3.431;3.429;3.427;3.425;3.423;3.421;3.420;
          3.418;3.416;3.415;3.413;3.412;3.410;3.409;3.407;3.406;3.405;3.403;3.402;3.401;
          3.399;3.398;3.397;3.396;3.395;3.394;3.393;3.392;3.390|]

    /// Welch's t-test statistic for two given sample statistics.
    let welchStatistic s1 s2 =
        let f1 = s1.Variance/float s1.N
        let f2 = s2.Variance/float s2.N
        {
            T = (s1.Mean-s2.Mean)/sqrt(f1+f2)
            DF= if f1=0.0 && f2=0.0 then 1
                else sqr(f1+f2)/(sqr f1/float(s1.N-1)+sqr f2/float(s2.N-1)) |> int
        }

    /// Welch's t-test for a given Welch statistic to a confidence level of 0.1%.
    let welchTest w =
        if abs w.T < Array.get tInv (min w.DF (Array.length tInv) - 1) then 0 else sign w.T

Performance testing

Three performance metrics will be created: time, memory allocated, and garbage collections.

Each of these will be repeated until at least the metric target is reached to ensure it is reliably measuring the metric and not any framework overhead.

Statistics functions will be created for each of the metrics to measure a function accurate to a mean standard error of 1%.

Compare functions will be created for each of the metrics to compare two functions to a confidence level of 0.1%. The functions will be considered equal after 10,000 degrees of freedom have past without a positive test result.

Performance testing code

 1: 
 2: 
 3: 
 4: 
 5: 
 6: 
 7: 
 8: 
 9: 
10: 
11: 
12: 
13: 
14: 
15: 
16: 
17: 
18: 
19: 
20: 
21: 
22: 
23: 
24: 
25: 
26: 
27: 
28: 
29: 
30: 
31: 
32: 
33: 
34: 
35: 
36: 
37: 
38: 
39: 
40: 
41: 
42: 
43: 
44: 
45: 
46: 
47: 
48: 
49: 
50: 
51: 
52: 
53: 
54: 
55: 
56: 
57: 
58: 
59: 
60: 
61: 
62: 
63: 
64: 
65: 
66: 
67: 
68: 
69: 
70: 
71: 
72: 
73: 
74: 
75: 
76: 
77: 
78: 
79: 
80: 
81: 
82: 
83: 
84: 
85: 
module Performance =
    /// Find the iteration count to get to at least the metric target.
    let inline private targetIterationCount metric metricTarget f =
        let rec find n =
            let item = metric n f
            if (item<<<3)>=metricTarget then n*int metricTarget/int item+1
            else find (n*10)
        find 1
        
    /// Create and iterate a statistics sequence for a metric until the given accuracy.
    let inline private measureStatistics (metric,metricTarget) relativeError f =
        let ic = targetIterationCount metric metricTarget f
        Seq.initInfinite (fun _ -> metric ic f) |> sampleStatistics
        |> Seq.map (singleIteration ic)
        |> Seq.find (fun s -> s.MeanStandardError<=relativeError*s.Mean)

    /// Create and iterate two statistics sequences until the metric means can be compared.
    let inline private measureCompare (metric,metricTarget) f1 f2 =
        if f1 id<>f2 id then failwith "function results are not the same"
        let ic = targetIterationCount metric metricTarget f2
        let stats f = Seq.initInfinite (fun _ -> metric ic f)
                      |> sampleStatistics
                      |> Seq.map (singleIteration ic)
        Seq.map2 welchStatistic (stats f1) (stats f2)
        |> Seq.pick (fun w ->
            let maxDF = 10000
            if w.DF>maxDF then Some 0 else match welchTest w with |0->None |c->Some c)

    /// Measure the given function for the iteration count using the start and end metric.
    let inline private measureMetric startMetric endMetric ic f =
        GC.Collect()
        GC.WaitForPendingFinalizers()
        GC.Collect()
        let mutable total = LanguagePrimitives.GenericZero
        let measurer toMeasure =
            fun args ->
                let s = startMetric()
                let ret = toMeasure args
                let m = endMetric s
                total<-total+m
                ret
        let rec loop i = if i>0 then f measurer |> ignore; loop (i-1)
        loop ic
        total
    
    /// Measure the time metric for the given function and iteration count.
    let private timeMetric ic f = 
        measureMetric Stopwatch.GetTimestamp (fun t -> Stopwatch.GetTimestamp()-t) ic f
        
    /// Measure the memory metric for the given function and iteration count.
    let private memoryMetric ic f =
        let inline startMetric() =
            if GC.TryStartNoGCRegion(1L<<<22) |> not then failwith "TryStartNoGCRegion"
            GC.GetTotalMemory false
        let inline endMetric s =
            let t = GC.GetTotalMemory false - s
            GC.EndNoGCRegion()
            t
        measureMetric startMetric endMetric ic f

    /// Measure the garbage collection metric for the given function and iteration count.
    let private garbageMetric ic f =
        let count() = GC.CollectionCount 0 + GC.CollectionCount 1 + GC.CollectionCount 2
        measureMetric count (fun s -> count()-s) ic f

    /// Measure definitions which are a metric together with a metric target.
    let private oneMillisecond = Stopwatch.Frequency/1000L
    let private timeMeasure = timeMetric, oneMillisecond
    let private memoryMeasure = memoryMetric, 1024L //1KB
    let private garbageMeasure = garbageMetric, 10

    /// Time statistics for a given function accurate to a mean standard error of 1%.
    let timeStatistics f =
        measureStatistics timeMeasure 0.01 f |> scale (1.0/float Stopwatch.Frequency)
    /// Memory statistics for a given function accurate to a mean standard error of 1%.
    let memoryStatistics f = measureStatistics memoryMeasure 0.01 f
    /// GC count statistics for a given function accurate to a mean standard error of 1%.
    let gcCountStatistics f = measureStatistics garbageMeasure 0.01 f
    
    /// Time comparison for two given functions to a confidence level of 0.1%.
    let timeCompare f1 f2 = measureCompare timeMeasure f1 f2
    /// Memory comparison for two given functions to a confidence level of 0.1%.
    let memoryCompare f1 f2 = measureCompare memoryMeasure f1 f2
    /// GC count comparison for two given functions to a confidence level of 0.1%.
    let gcCountCompare f1 f2 = measureCompare garbageMeasure f1 f2

Conclusion

The performance testing functions have a very simple signature. The statistics functions just take the function to be measured. The compare functions just take the two functions to be compared.

The statistics functions give an overview of a function's performance. These can easily be combined to produce a useful performance report.

The compare functions can be used in unit tests since they are a relative test and hence should be independent of the machine. They are also fast since they stop as soon as the given confidence level is achieved. The compare functions could also be extended to test if a function is a given percentage better than another.

Modularity from higher-order functions and lazy evaluation together with a little maths have produced a simple yet powerful performance testing library.

UPDATED (2016-10-21): The functions have been extended to be able to measure a subfunction of the passed in functions. This enables set up to be done without effecting the measurement. The subfunction can be called multiple times and the metric will be aggregated. An example of its use can be found in the Outliers and MAD post.

UPDATED (2017-01-03): Performance testing functionality has been added to the F# testing library Expecto using the code in this post.

namespace System
namespace System.Diagnostics
Multiple items
type AutoOpenAttribute =
  inherit Attribute
  new : unit -> AutoOpenAttribute
  new : path:string -> AutoOpenAttribute
  member Path : string

Full name: Microsoft.FSharp.Core.AutoOpenAttribute

--------------------
new : unit -> AutoOpenAttribute
new : path:string -> AutoOpenAttribute
val sqr : x:'a -> 'b (requires member ( * ))

Full name: Main.Functions.sqr
val x : 'a (requires member ( * ))
type SampleStatistics =
  {N: int;
   Mean: float;
   Variance: float;}
  member MeanStandardError : float
  member StandardDeviation : float

Full name: Main.SampleStatistics
SampleStatistics.N: int
Multiple items
val int : value:'T -> int (requires member op_Explicit)

Full name: Microsoft.FSharp.Core.Operators.int

--------------------
type int = int32

Full name: Microsoft.FSharp.Core.int

--------------------
type int<'Measure> = int

Full name: Microsoft.FSharp.Core.int<_>
SampleStatistics.Mean: float
Multiple items
val float : value:'T -> float (requires member op_Explicit)

Full name: Microsoft.FSharp.Core.Operators.float

--------------------
type float = Double

Full name: Microsoft.FSharp.Core.float

--------------------
type float<'Measure> = float

Full name: Microsoft.FSharp.Core.float<_>
SampleStatistics.Variance: float
val s : SampleStatistics
member SampleStatistics.StandardDeviation : float

Full name: Main.SampleStatistics.StandardDeviation
val sqrt : value:'T -> 'U (requires member Sqrt)

Full name: Microsoft.FSharp.Core.Operators.sqrt
member SampleStatistics.MeanStandardError : float

Full name: Main.SampleStatistics.MeanStandardError
type WelchStatistic =
  {T: float;
   DF: int;}

Full name: Main.WelchStatistic
WelchStatistic.T: float
WelchStatistic.DF: int
val sampleStatistics : s:seq<'a> -> seq<SampleStatistics> (requires member op_Explicit)

Full name: Main.Statistics.sampleStatistics


 Online statistics sequence for a given sample sequence.
val s : seq<'a> (requires member op_Explicit)
val calc : (int * float * float -> float -> int * float * float)
val n : int
val m : float
val s : float
val x : float
val m' : float
module Seq

from Microsoft.FSharp.Collections
val map : mapping:('T -> 'U) -> source:seq<'T> -> seq<'U>

Full name: Microsoft.FSharp.Collections.Seq.map
val scan : folder:('State -> 'T -> 'State) -> state:'State -> source:seq<'T> -> seq<'State>

Full name: Microsoft.FSharp.Collections.Seq.scan
val skip : count:int -> source:seq<'T> -> seq<'T>

Full name: Microsoft.FSharp.Collections.Seq.skip
val scale : f:float -> s:SampleStatistics -> SampleStatistics

Full name: Main.Statistics.scale


 Scale the statistics for a given underlying random variable change of scale.
val f : float
val singleIteration : ic:int -> s:SampleStatistics -> SampleStatistics

Full name: Main.Statistics.singleIteration


 Single iteration statistics for a given iteration count and total statistics.
val ic : int
val private tInv : float []

Full name: Main.Statistics.tInv


 Student's t-distribution inverse for 0.1% confidence level by degrees of freedom.
val welchStatistic : s1:SampleStatistics -> s2:SampleStatistics -> WelchStatistic

Full name: Main.Statistics.welchStatistic


 Welch's t-test statistic for two given sample statistics.
val s1 : SampleStatistics
val s2 : SampleStatistics
val f1 : float
val f2 : float
val welchTest : w:WelchStatistic -> int

Full name: Main.Statistics.welchTest


 Welch's t-test for a given Welch statistic to a confidence level of 0.1%.
val w : WelchStatistic
val abs : value:'T -> 'T (requires member Abs)

Full name: Microsoft.FSharp.Core.Operators.abs
type Array =
  member Clone : unit -> obj
  member CopyTo : array:Array * index:int -> unit + 1 overload
  member GetEnumerator : unit -> IEnumerator
  member GetLength : dimension:int -> int
  member GetLongLength : dimension:int -> int64
  member GetLowerBound : dimension:int -> int
  member GetUpperBound : dimension:int -> int
  member GetValue : [<ParamArray>] indices:int[] -> obj + 7 overloads
  member Initialize : unit -> unit
  member IsFixedSize : bool
  ...

Full name: System.Array
val get : array:'T [] -> index:int -> 'T

Full name: Microsoft.FSharp.Collections.Array.get
val min : e1:'T -> e2:'T -> 'T (requires comparison)

Full name: Microsoft.FSharp.Core.Operators.min
val length : array:'T [] -> int

Full name: Microsoft.FSharp.Collections.Array.length
val sign : value:'T -> int (requires member get_Sign)

Full name: Microsoft.FSharp.Core.Operators.sign
module Statistics

from Main
module Performance

from Main
val private targetIterationCount : metric:(int -> 'a -> 'b) -> metricTarget:'b -> f:'a -> int (requires member ( <<< ) and comparison and member op_Explicit)

Full name: Main.Performance.targetIterationCount


 Find the iteration count to get to at least the metric target.
val metric : (int -> 'a -> 'b) (requires member ( <<< ) and comparison and member op_Explicit)
val metricTarget : 'b (requires member ( <<< ) and comparison and member op_Explicit)
val f : 'a
val find : (int -> int)
val item : 'b (requires member ( <<< ) and comparison and member op_Explicit)
val private measureStatistics : metric:(int -> 'a -> 'b) * metricTarget:'b -> relativeError:float -> f:'a -> SampleStatistics (requires member ( <<< ) and comparison and member op_Explicit and member op_Explicit)

Full name: Main.Performance.measureStatistics


 Create and iterate a statistics sequence for a metric until the given accuracy.
val metric : (int -> 'a -> 'b) (requires member ( <<< ) and comparison and member op_Explicit and member op_Explicit)
val metricTarget : 'b (requires member ( <<< ) and comparison and member op_Explicit and member op_Explicit)
val relativeError : float
val initInfinite : initializer:(int -> 'T) -> seq<'T>

Full name: Microsoft.FSharp.Collections.Seq.initInfinite
val find : predicate:('T -> bool) -> source:seq<'T> -> 'T

Full name: Microsoft.FSharp.Collections.Seq.find
property SampleStatistics.MeanStandardError: float
val private measureCompare : metric:(int -> (('a -> 'a) -> 'b) -> 'c) * metricTarget:'c -> f1:(('a -> 'a) -> 'b) -> f2:(('a -> 'a) -> 'b) -> int (requires equality and member ( <<< ) and comparison and member op_Explicit and member op_Explicit)

Full name: Main.Performance.measureCompare


 Create and iterate two statistics sequences until the metric means can be compared.
val metric : (int -> (('a -> 'a) -> 'b) -> 'c) (requires equality and member ( <<< ) and comparison and member op_Explicit and member op_Explicit)
val metricTarget : 'c (requires member ( <<< ) and comparison and member op_Explicit and member op_Explicit)
val f1 : (('a -> 'a) -> 'b) (requires equality)
val f2 : (('a -> 'a) -> 'b) (requires equality)
val id : x:'T -> 'T

Full name: Microsoft.FSharp.Core.Operators.id
val failwith : message:string -> 'T

Full name: Microsoft.FSharp.Core.Operators.failwith
val stats : ((('a -> 'a) -> 'b) -> seq<SampleStatistics>) (requires equality)
val f : (('a -> 'a) -> 'b) (requires equality)
val map2 : mapping:('T1 -> 'T2 -> 'U) -> source1:seq<'T1> -> source2:seq<'T2> -> seq<'U>

Full name: Microsoft.FSharp.Collections.Seq.map2
val pick : chooser:('T -> 'U option) -> source:seq<'T> -> 'U

Full name: Microsoft.FSharp.Collections.Seq.pick
val maxDF : int
union case Option.Some: Value: 'T -> Option<'T>
union case Option.None: Option<'T>
val c : int
val private measureMetric : startMetric:(unit -> 'a) -> endMetric:('a -> 'b) -> ic:int -> f:((('d -> 'e) -> 'd -> 'e) -> 'f) -> 'c (requires member ( + ) and member get_Zero)

Full name: Main.Performance.measureMetric


 Measure the given function for the iteration count using the start and end metric.
val startMetric : (unit -> 'a)
val endMetric : ('a -> 'b) (requires member ( + ) and member get_Zero)
val f : ((('d -> 'e) -> 'd -> 'e) -> 'f)
type GC =
  static member AddMemoryPressure : bytesAllocated:int64 -> unit
  static member CancelFullGCNotification : unit -> unit
  static member Collect : unit -> unit + 2 overloads
  static member CollectionCount : generation:int -> int
  static member GetGeneration : obj:obj -> int + 1 overload
  static member GetTotalMemory : forceFullCollection:bool -> int64
  static member KeepAlive : obj:obj -> unit
  static member MaxGeneration : int
  static member ReRegisterForFinalize : obj:obj -> unit
  static member RegisterForFullGCNotification : maxGenerationThreshold:int * largeObjectHeapThreshold:int -> unit
  ...

Full name: System.GC
GC.Collect() : unit
GC.Collect(generation: int) : unit
GC.Collect(generation: int, mode: GCCollectionMode) : unit
GC.WaitForPendingFinalizers() : unit
val mutable total : 'c (requires member get_Zero and member ( + ))
module LanguagePrimitives

from Microsoft.FSharp.Core
val GenericZero<'T (requires member get_Zero)> : 'T (requires member get_Zero)

Full name: Microsoft.FSharp.Core.LanguagePrimitives.GenericZero
val measurer : (('g -> 'h) -> 'g -> 'h)
val toMeasure : ('g -> 'h)
val args : 'g
val s : 'a
val ret : 'h
val m : 'b (requires member ( + ) and member get_Zero)
val loop : (int -> unit)
val i : int
val ignore : value:'T -> unit

Full name: Microsoft.FSharp.Core.Operators.ignore
val private timeMetric : ic:int -> f:((('a -> 'b) -> 'a -> 'b) -> 'c) -> int64

Full name: Main.Performance.timeMetric


 Measure the time metric for the given function and iteration count.
val f : ((('a -> 'b) -> 'a -> 'b) -> 'c)
Multiple items
type Stopwatch =
  new : unit -> Stopwatch
  member Elapsed : TimeSpan
  member ElapsedMilliseconds : int64
  member ElapsedTicks : int64
  member IsRunning : bool
  member Reset : unit -> unit
  member Restart : unit -> unit
  member Start : unit -> unit
  member Stop : unit -> unit
  static val Frequency : int64
  ...

Full name: System.Diagnostics.Stopwatch

--------------------
Stopwatch() : unit
Stopwatch.GetTimestamp() : int64
val t : int64
val private memoryMetric : ic:int -> f:((('a -> 'b) -> 'a -> 'b) -> 'c) -> int64

Full name: Main.Performance.memoryMetric


 Measure the memory metric for the given function and iteration count.
val startMetric : (unit -> int64)
val not : value:bool -> bool

Full name: Microsoft.FSharp.Core.Operators.not
GC.GetTotalMemory(forceFullCollection: bool) : int64
val endMetric : (int64 -> int64)
val s : int64
val private garbageMetric : ic:int -> f:((('a -> 'b) -> 'a -> 'b) -> 'c) -> int

Full name: Main.Performance.garbageMetric


 Measure the garbage collection metric for the given function and iteration count.
val count : (unit -> int)
GC.CollectionCount(generation: int) : int
val s : int
val private oneMillisecond : int64

Full name: Main.Performance.oneMillisecond


 Measure definitions which are a metric together with a metric target.
field Stopwatch.Frequency
val private timeMeasure : (int -> ((('a -> 'b) -> 'a -> 'b) -> 'c) -> int64) * int64

Full name: Main.Performance.timeMeasure
val private memoryMeasure : (int -> ((('a -> 'b) -> 'a -> 'b) -> 'c) -> int64) * int64

Full name: Main.Performance.memoryMeasure
val private garbageMeasure : (int -> ((('a -> 'b) -> 'a -> 'b) -> 'c) -> int) * int

Full name: Main.Performance.garbageMeasure
val timeStatistics : f:((('a -> 'b) -> 'a -> 'b) -> 'c) -> SampleStatistics

Full name: Main.Performance.timeStatistics


 Time statistics for a given function accurate to a mean standard error of 1%.
val memoryStatistics : f:((('a -> 'b) -> 'a -> 'b) -> 'c) -> SampleStatistics

Full name: Main.Performance.memoryStatistics


 Memory statistics for a given function accurate to a mean standard error of 1%.
val gcCountStatistics : f:((('a -> 'b) -> 'a -> 'b) -> 'c) -> SampleStatistics

Full name: Main.Performance.gcCountStatistics


 GC count statistics for a given function accurate to a mean standard error of 1%.
val timeCompare : f1:((('a -> 'b) -> 'a -> 'b) -> 'c) -> f2:((('a -> 'b) -> 'a -> 'b) -> 'c) -> int (requires equality)

Full name: Main.Performance.timeCompare


 Time comparison for two given functions to a confidence level of 0.1%.
val f1 : ((('a -> 'b) -> 'a -> 'b) -> 'c) (requires equality)
val f2 : ((('a -> 'b) -> 'a -> 'b) -> 'c) (requires equality)
val memoryCompare : f1:((('a -> 'b) -> 'a -> 'b) -> 'c) -> f2:((('a -> 'b) -> 'a -> 'b) -> 'c) -> int (requires equality)

Full name: Main.Performance.memoryCompare


 Memory comparison for two given functions to a confidence level of 0.1%.
val gcCountCompare : f1:((('a -> 'b) -> 'a -> 'b) -> 'c) -> f2:((('a -> 'b) -> 'a -> 'b) -> 'c) -> int (requires equality)

Full name: Main.Performance.gcCountCompare


 GC count comparison for two given functions to a confidence level of 0.1%.