Modularity from Lazy Evaluation - Performance Testing

20 May 2016

This is the second example of how higher-order functions and lazy evaluation can reduce complexity and lead to more modular software.

Background

To performance test code a number of iterations have to be performed so a more stable average can be compared. The number of iterations is usually an input and chosen arbitrarily.

This post covers how statistical tests can be used to remove the need for this input. This simplifies performance testing and makes the results more robust and useful.

Statistics

Lazy evaluation can be used to produce an online sample statistics sequence.

The standard error of a sample mean is given by

\[SE_{\bar{x}} = \frac{s}{\sqrt{n}}\]

The statistics sequence can be iterated until a given mean standard error level of accuracy is achieved.

Welch's t-test for comparing the mean of two samples is given by

\[t = \frac{\bar{x}_1-\bar{x}_2}{\sqrt{f_1+f_2}}\]

\[df = \frac{\left(f_1+f_2\right)^2}{\frac{f_1^2}{n_1-1}+\frac{f_2^2}{n_2-1}}\]

\[f_i = \frac{s_i^2}{n_i}\]

where \(n\) is the sample size, \(\bar{x}\) is the sample mean, \(s^2\) is the sample variance, \(t\) is Welch's t statistic, and \(df\) is the degrees of freedom of the statistic.

Welch's t statistic can be compared to the inverse Student's t-distribution for a given confidence level to test if the sample means are different.

Two sample statistics sequences can be iterated together and compared to decide if one mean is larger than the other.

Statistics code

type SampleStatistics = {N:int;Mean:float;Variance:float}
                        member s.StandardDeviation = sqrt s.Variance
                        member s.MeanStandardError = sqrt(s.Variance/float s.N)

type WelchStatistic = {T:float;DF:int}

module Statistics =
    /// Online statistics sequence for a given sample sequence.
    let inline sampleStatistics s =
        let calc (n,m,s) x =
            let m'=m+(x-m)/float(n+1)
            n+1,m',s+(x-m)*(x-m')
        Seq.map float s |> Seq.scan calc (0,0.0,0.0) |> Seq.skip 3
        |> Seq.map (fun (n,m,s) -> {N=n;Mean=m;Variance=s/float(n-1)})

    /// Scale the statistics for a given underlying random variable change of scale.
    let scale f s = {s with Mean=s.Mean*f;Variance=s.Variance*sqr f}

    /// Single iteration statistics for a given iteration count and total statistics.
    let singleIteration ic s =
        {N=s.N*ic;Mean=s.Mean/float ic;Variance=s.Variance/float ic}

    /// Student's t-distribution inverse for 0.1% confidence level by
    /// degrees of freedom.
    let private tInv = 
        [|636.6;31.60;12.92;8.610;6.869;5.959;5.408;5.041;4.781;4.587;4.437;4.318;4.221;
          4.140;4.073;4.015;3.965;3.922;3.883;3.850;3.819;3.792;3.768;3.745;3.725;3.707;
          3.690;3.674;3.659;3.646;3.633;3.622;3.611;3.601;3.591;3.582;3.574;3.566;3.558;
          3.551;3.544;3.538;3.532;3.526;3.520;3.515;3.510;3.505;3.500;3.496;3.492;3.488;
          3.484;3.480;3.476;3.473;3.470;3.466;3.463;3.460;3.457;3.454;3.452;3.449;3.447;
          3.444;3.442;3.439;3.437;3.435;3.433;3.431;3.429;3.427;3.425;3.423;3.421;3.420;
          3.418;3.416;3.415;3.413;3.412;3.410;3.409;3.407;3.406;3.405;3.403;3.402;3.401;
          3.399;3.398;3.397;3.396;3.395;3.394;3.393;3.392;3.390|]

    /// Welch's t-test statistic for two given sample statistics.
    let welchStatistic s1 s2 =
        let f1 = s1.Variance/float s1.N
        let f2 = s2.Variance/float s2.N
        {
            T = (s1.Mean-s2.Mean)/sqrt(f1+f2)
            DF= if f1=0.0 && f2=0.0 then 1
                else sqr(f1+f2)/(sqr f1/float(s1.N-1)+sqr f2/float(s2.N-1)) |> int
        }

    /// Welch's t-test for a given Welch statistic to a confidence level of 0.1%.
    let welchTest w =
        if abs w.T < Array.get tInv (min w.DF (Array.length tInv) - 1) then 0
        else sign w.T

Performance testing

Three performance metrics will be created: time, memory allocated, and garbage collections.

Each of these will be repeated until at least the metric target is reached to ensure it is reliably measuring the metric and not any framework overhead.

Statistics functions will be created for each of the metrics to measure a function accurate to a mean standard error of 1%.

Compare functions will be created for each of the metrics to compare two functions to a confidence level of 0.1%. The functions will be considered equal after 10,000 degrees of freedom have past without a positive test result.

Performance testing code

module Performance =
    /// Find the iteration count to get to at least the metric target.
    let inline private targetIterationCount metric metricTarget f =
        let rec find n =
            let item = metric n f
            if (item<<<3)>=metricTarget then n*int metricTarget/int item+1
            else find (n*10)
        find 1
        
    /// Create and iterate a statistics sequence for a metric until the given accuracy.
    let inline private measureStatistics (metric,metricTarget) relativeError f =
        let ic = targetIterationCount metric metricTarget f
        Seq.initInfinite (fun _ -> metric ic f) |> sampleStatistics
        |> Seq.map (singleIteration ic)
        |> Seq.find (fun s -> s.MeanStandardError<=relativeError*s.Mean)

    /// Create and iterate two statistics sequences until the metric means can be
    /// compared.
    let inline private measureCompare (metric,metricTarget) f1 f2 =
        if f1 id<>f2 id then failwith "function results are not the same"
        let ic = targetIterationCount metric metricTarget f2
        let stats f = Seq.initInfinite (fun _ -> metric ic f)
                      |> sampleStatistics
                      |> Seq.map (singleIteration ic)
        Seq.map2 welchStatistic (stats f1) (stats f2)
        |> Seq.pick (fun w ->
            let maxDF = 10000
            if w.DF>maxDF then Some 0 else match welchTest w with |0->None |c->Some c)

    /// Measure the given function for the iteration count using the start and end
    /// metric.
    let inline private measureMetric startMetric endMetric ic f =
        GC.Collect()
        GC.WaitForPendingFinalizers()
        GC.Collect()
        let mutable total = LanguagePrimitives.GenericZero
        let measurer toMeasure =
            fun args ->
                let s = startMetric()
                let ret = toMeasure args
                let m = endMetric s
                total<-total+m
                ret
        let rec loop i = if i>0 then f measurer |> ignore; loop (i-1)
        loop ic
        total
    
    /// Measure the time metric for the given function and iteration count.
    let private timeMetric ic f = 
        measureMetric Stopwatch.GetTimestamp (fun t -> Stopwatch.GetTimestamp()-t) ic f
        
    /// Measure the memory metric for the given function and iteration count.
    let private memoryMetric ic f =
        let inline startMetric() =
            if GC.TryStartNoGCRegion(1L<<<22) |> not then failwith "TryStartNoGCRegion"
            GC.GetTotalMemory false
        let inline endMetric s =
            let t = GC.GetTotalMemory false - s
            GC.EndNoGCRegion()
            t
        measureMetric startMetric endMetric ic f

    /// Measure the garbage collection metric for the given function and iteration
    /// count.
    let private garbageMetric ic f =
        let count() = GC.CollectionCount 0 + GC.CollectionCount 1 + GC.CollectionCount 2
        measureMetric count (fun s -> count()-s) ic f

    /// Measure definitions which are a metric together with a metric target.
    let private oneMillisecond = Stopwatch.Frequency/1000L
    let private timeMeasure = timeMetric, oneMillisecond
    let private memoryMeasure = memoryMetric, 1024L //1KB
    let private garbageMeasure = garbageMetric, 10

    /// Time statistics for a given function accurate to a mean standard error of 1%.
    let timeStatistics f =
        measureStatistics timeMeasure 0.01 f |> scale (1.0/float Stopwatch.Frequency)
    /// Memory statistics for a given function accurate to a mean standard error of 1%.
    let memoryStatistics f = measureStatistics memoryMeasure 0.01 f
    /// GC count statistics for a given function accurate to a mean standard error of
    /// 1%.
    let gcCountStatistics f = measureStatistics garbageMeasure 0.01 f
    
    /// Time comparison for two given functions to a confidence level of 0.1%.
    let timeCompare f1 f2 = measureCompare timeMeasure f1 f2
    /// Memory comparison for two given functions to a confidence level of 0.1%.
    let memoryCompare f1 f2 = measureCompare memoryMeasure f1 f2
    /// GC count comparison for two given functions to a confidence level of 0.1%.
    let gcCountCompare f1 f2 = measureCompare garbageMeasure f1 f2

Conclusion

The performance testing functions have a very simple signature. The statistics functions just take the function to be measured. The compare functions just take the two functions to be compared.

The statistics functions give an overview of a function's performance. These can easily be combined to produce a useful performance report.

The compare functions can be used in unit tests since they are a relative test and hence should be independent of the machine. They are also fast since they stop as soon as the given confidence level is achieved. The compare functions could also be extended to test if a function is a given percentage better than another.

Modularity from higher-order functions and lazy evaluation together with a little maths have produced a simple yet powerful performance testing library.

UPDATED (2016-10-21): The functions have been extended to be able to measure a subfunction of the passed in functions. This enables set up to be done without effecting the measurement. The subfunction can be called multiple times and the metric will be aggregated. An example of its use can be found in the Outliers and MAD post.

UPDATED (2017-01-03): Performance testing functionality has been added to the F# testing library Expecto using the code in this post.

namespace System

namespace System.Diagnostics

Multiple items
type AutoOpenAttribute =
  inherit Attribute
  new : unit -> AutoOpenAttribute
  new : path:string -> AutoOpenAttribute
  member Path : string

--------------------
new : unit -> AutoOpenAttribute
new : path:string -> AutoOpenAttribute

val sqr : x:'a -> 'b (requires member ( * ))

val x : 'a (requires member ( * ))

SampleStatistics.N: int

Multiple items
val int : value:'T -> int (requires member op_Explicit)

--------------------
type int = int32

--------------------
type int<'Measure> = int

SampleStatistics.Mean: float

Multiple items
val float : value:'T -> float (requires member op_Explicit)

--------------------
type float = Double

--------------------
type float<'Measure> = float

SampleStatistics.Variance: float

val s : SampleStatistics

val sqrt : value:'T -> 'U (requires member Sqrt)

type WelchStatistic =
{T: float;
DF: int;}

WelchStatistic.T: float

WelchStatistic.DF: int

val sampleStatistics : s:seq<'a> -> seq<SampleStatistics> (requires member op_Explicit)

Online statistics sequence for a given sample sequence.

val s : seq<'a> (requires member op_Explicit)

val calc : (int * float * float -> float -> int * float * float)

val n : int

val m : float

val s : float

val x : float

val m' : float

module Seq

from Microsoft.FSharp.Collections

val map : mapping:('T -> 'U) -> source:seq<'T> -> seq<'U>

val scan : folder:('State -> 'T -> 'State) -> state:'State -> source:seq<'T> -> seq<'State>

val skip : count:int -> source:seq<'T> -> seq<'T>

val scale : f:float -> s:SampleStatistics -> SampleStatistics

Scale the statistics for a given underlying random variable change of scale.

val f : float

val singleIteration : ic:int -> s:SampleStatistics -> SampleStatistics

Single iteration statistics for a given iteration count and total statistics.

val ic : int

val private tInv : float []

Student's t-distribution inverse for 0.1% confidence level by
degrees of freedom.

val welchStatistic : s1:SampleStatistics -> s2:SampleStatistics -> WelchStatistic

Welch's t-test statistic for two given sample statistics.

val s1 : SampleStatistics

val s2 : SampleStatistics

val f1 : float

val f2 : float

val welchTest : w:WelchStatistic -> int

Welch's t-test for a given Welch statistic to a confidence level of 0.1%.

val w : WelchStatistic

val abs : value:'T -> 'T (requires member Abs)

type Array =
  member Clone : unit -> obj
  member CopyTo : array:Array * index:int -> unit + 1 overload
  member GetEnumerator : unit -> IEnumerator
  member GetLength : dimension:int -> int
  member GetLongLength : dimension:int -> int64
  member GetLowerBound : dimension:int -> int
  member GetUpperBound : dimension:int -> int
  member GetValue : [<ParamArray>] indices:int[] -> obj + 7 overloads
  member Initialize : unit -> unit
  member IsFixedSize : bool
  ...

val get : array:'T [] -> index:int -> 'T

val min : e1:'T -> e2:'T -> 'T (requires comparison)

val length : array:'T [] -> int

val sign : value:'T -> int (requires member get_Sign)

module Statistics

from Main

module Performance

from Main

val private targetIterationCount : metric:(int -> 'a -> 'b) -> metricTarget:'b -> f:'a -> int (requires member ( <<< ) and comparison and member op_Explicit)

Find the iteration count to get to at least the metric target.

val metric : (int -> 'a -> 'b) (requires member ( <<< ) and comparison and member op_Explicit)

val metricTarget : 'b (requires member ( <<< ) and comparison and member op_Explicit)

val f : 'a

val find : (int -> int)

val item : 'b (requires member ( <<< ) and comparison and member op_Explicit)

val private measureStatistics : metric:(int -> 'a -> 'b) * metricTarget:'b -> relativeError:float -> f:'a -> SampleStatistics (requires member ( <<< ) and comparison and member op_Explicit and member op_Explicit)

Create and iterate a statistics sequence for a metric until the given accuracy.

val metric : (int -> 'a -> 'b) (requires member ( <<< ) and comparison and member op_Explicit and member op_Explicit)

val metricTarget : 'b (requires member ( <<< ) and comparison and member op_Explicit and member op_Explicit)

val relativeError : float

val initInfinite : initializer:(int -> 'T) -> seq<'T>

val find : predicate:('T -> bool) -> source:seq<'T> -> 'T

property SampleStatistics.MeanStandardError: float

val private measureCompare : metric:(int -> (('a -> 'a) -> 'b) -> 'c) * metricTarget:'c -> f1:(('a -> 'a) -> 'b) -> f2:(('a -> 'a) -> 'b) -> int (requires equality and member ( <<< ) and comparison and member op_Explicit and member op_Explicit)

Create and iterate two statistics sequences until the metric means can be
compared.

val metric : (int -> (('a -> 'a) -> 'b) -> 'c) (requires equality and member ( <<< ) and comparison and member op_Explicit and member op_Explicit)

val metricTarget : 'c (requires member ( <<< ) and comparison and member op_Explicit and member op_Explicit)

val f1 : (('a -> 'a) -> 'b) (requires equality)

val f2 : (('a -> 'a) -> 'b) (requires equality)

val id : x:'T -> 'T

val failwith : message:string -> 'T

val stats : ((('a -> 'a) -> 'b) -> seq<SampleStatistics>) (requires equality)

val f : (('a -> 'a) -> 'b) (requires equality)

val map2 : mapping:('T1 -> 'T2 -> 'U) -> source1:seq<'T1> -> source2:seq<'T2> -> seq<'U>

val pick : chooser:('T -> 'U option) -> source:seq<'T> -> 'U

val maxDF : int

union case Option.Some: Value: 'T -> Option<'T>

union case Option.None: Option<'T>

val c : int

val private measureMetric : startMetric:(unit -> 'a) -> endMetric:('a -> 'b) -> ic:int -> f:((('d -> 'e) -> 'd -> 'e) -> 'f) -> 'c (requires member ( + ) and member get_Zero)

Measure the given function for the iteration count using the start and end
metric.

val startMetric : (unit -> 'a)

val endMetric : ('a -> 'b) (requires member ( + ) and member get_Zero)

val f : ((('d -> 'e) -> 'd -> 'e) -> 'f)

type GC =
  static member AddMemoryPressure : bytesAllocated:int64 -> unit
  static member CancelFullGCNotification : unit -> unit
  static member Collect : unit -> unit + 4 overloads
  static member CollectionCount : generation:int -> int
  static member EndNoGCRegion : unit -> unit
  static member GetGeneration : obj:obj -> int + 1 overload
  static member GetTotalMemory : forceFullCollection:bool -> int64
  static member KeepAlive : obj:obj -> unit
  static member MaxGeneration : int
  static member ReRegisterForFinalize : obj:obj -> unit
  ...

GC.Collect() : unit
GC.Collect(generation: int) : unit
GC.Collect(generation: int, mode: GCCollectionMode) : unit
GC.Collect(generation: int, mode: GCCollectionMode, blocking: bool) : unit
GC.Collect(generation: int, mode: GCCollectionMode, blocking: bool, compacting: bool) : unit

GC.WaitForPendingFinalizers() : unit

val mutable total : 'c (requires member get_Zero and member ( + ))

module LanguagePrimitives

from Microsoft.FSharp.Core

val GenericZero<'T (requires member get_Zero)> : 'T (requires member get_Zero)

val measurer : (('g -> 'h) -> 'g -> 'h)

val toMeasure : ('g -> 'h)

val args : 'g

val s : 'a

val ret : 'h

val m : 'b (requires member ( + ) and member get_Zero)

val loop : (int -> unit)

val i : int

val ignore : value:'T -> unit

val private timeMetric : ic:int -> f:((('a -> 'b) -> 'a -> 'b) -> 'c) -> int64

Measure the time metric for the given function and iteration count.

val f : ((('a -> 'b) -> 'a -> 'b) -> 'c)

Multiple items
type Stopwatch =
  new : unit -> Stopwatch
  member Elapsed : TimeSpan
  member ElapsedMilliseconds : int64
  member ElapsedTicks : int64
  member IsRunning : bool
  member Reset : unit -> unit
  member Restart : unit -> unit
  member Start : unit -> unit
  member Stop : unit -> unit
  static val Frequency : int64
  ...

--------------------
Stopwatch() : Stopwatch

Stopwatch.GetTimestamp() : int64

val t : int64

val private memoryMetric : ic:int -> f:((('a -> 'b) -> 'a -> 'b) -> 'c) -> int64

Measure the memory metric for the given function and iteration count.

val startMetric : (unit -> int64)

GC.TryStartNoGCRegion(totalSize: int64) : bool
GC.TryStartNoGCRegion(totalSize: int64, disallowFullBlockingGC: bool) : bool
GC.TryStartNoGCRegion(totalSize: int64, lohSize: int64) : bool
GC.TryStartNoGCRegion(totalSize: int64, lohSize: int64, disallowFullBlockingGC: bool) : bool

val not : value:bool -> bool

GC.GetTotalMemory(forceFullCollection: bool) : int64

val endMetric : (int64 -> int64)

val s : int64

GC.EndNoGCRegion() : unit

val private garbageMetric : ic:int -> f:((('a -> 'b) -> 'a -> 'b) -> 'c) -> int

Measure the garbage collection metric for the given function and iteration
count.

val count : (unit -> int)

GC.CollectionCount(generation: int) : int

val s : int

val private oneMillisecond : int64

Measure definitions which are a metric together with a metric target.

field Stopwatch.Frequency: int64

val private timeMeasure : (int -> ((('a -> 'b) -> 'a -> 'b) -> 'c) -> int64) * int64

val private memoryMeasure : (int -> ((('a -> 'b) -> 'a -> 'b) -> 'c) -> int64) * int64

val private garbageMeasure : (int -> ((('a -> 'b) -> 'a -> 'b) -> 'c) -> int) * int

val timeStatistics : f:((('a -> 'b) -> 'a -> 'b) -> 'c) -> SampleStatistics

Time statistics for a given function accurate to a mean standard error of 1%.

val memoryStatistics : f:((('a -> 'b) -> 'a -> 'b) -> 'c) -> SampleStatistics

Memory statistics for a given function accurate to a mean standard error of 1%.

val gcCountStatistics : f:((('a -> 'b) -> 'a -> 'b) -> 'c) -> SampleStatistics

GC count statistics for a given function accurate to a mean standard error of
1%.

val timeCompare : f1:((('a -> 'b) -> 'a -> 'b) -> 'c) -> f2:((('a -> 'b) -> 'a -> 'b) -> 'c) -> int (requires equality)

Time comparison for two given functions to a confidence level of 0.1%.

val f1 : ((('a -> 'b) -> 'a -> 'b) -> 'c) (requires equality)

val f2 : ((('a -> 'b) -> 'a -> 'b) -> 'c) (requires equality)

val memoryCompare : f1:((('a -> 'b) -> 'a -> 'b) -> 'c) -> f2:((('a -> 'b) -> 'a -> 'b) -> 'c) -> int (requires equality)

Memory comparison for two given functions to a confidence level of 0.1%.

val gcCountCompare : f1:((('a -> 'b) -> 'a -> 'b) -> 'c) -> f2:((('a -> 'b) -> 'a -> 'b) -> 'c) -> int (requires equality)

GC count comparison for two given functions to a confidence level of 0.1%.

« Modularity from Lazy Evaluation - Richardson Extrapolation F# Implementation of The Elm Architecture »

Anthony Lloyd

Modularity from Lazy Evaluation - Performance Testing

Background

Statistics

Statistics code

Performance testing

Performance testing code

Conclusion