Is F# Math Really Faster than C#?
December 30, 2010 5 Comments
I ran across an article claiming that F# was about an order of magnitude faster at calculating a basic math algorithm. Can this be true?
Sigmoid in C#
public static float Sigmoid(float value) { return 1.0f / (1.0f + (float) Math.Exp(-value)); }
//csc -o -debug- .method public static float32 Sigmoid(float32 'value') cil managed { .maxstack 8 L_0000: ldc.r8 1 L_0009: ldc.r8 1 L_0012: ldarg.0 L_0013: neg L_0014: conv.r8 L_0015: call float64 [mscorlib]System.Math::Exp(float64) L_001a: add L_001b: div L_001c: conv.r4 L_001d: ret }
Sigmoid in F#
let Sigmoid value = 1.0f/(1.0f + exp(-value));
//fsc --debug- --optimize+ .method public static float32 Sigmoid(float32 'value') cil managed { .maxstack 5 .locals init ( [0] float32 num) L_0000: ldc.r4 1 L_0005: ldc.r4 1 L_000a: ldarg.0 L_000b: neg L_000c: stloc.0 L_000d: ldloc.0 L_000e: conv.r8 L_000f: call float64 [mscorlib]System.Math::Exp(float64) L_0014: conv.r4 L_0015: add L_0016: div L_0017: ret }
The IL generated is nearly the same. The F# version allocates a little less memory using float32 variables on the stack where C# uses float64 and generally manages with a smaller stack, but the basic math operations are the same. Can these differences make an order of magnitude performance difference? Short answer is no.
Benchmarks are Tricky
C# sigmoid algorithm benchmark
// (c) Rinat Abdullin // http://abdullin.com/journal/2009/1/5/caching-activation-function-is-not-worth-it.html // sigmoidcs.cs using System; using System.Diagnostics; static class App { const float Scale = 320.0f; const int Resolution = 2047; const float Min = -Resolution/Scale; const float Max = Resolution/Scale; static readonly float[] _lut = InitLut(); static float[] InitLut() { var lut = new float[Resolution + 1]; for (int i = 0; i < Resolution + 1; i++) { lut[i] = (float) (1.0/(1.0 + Math.Exp(-i/Scale))); } return lut; } static float Sigmoid1(double value) { return (float) (1.0/(1.0 + Math.Exp(-value))); } static float Sigmoid2(float value) { if (value <= Min) return 0.0f; if (value >= Max) return 1.0f; var f = value*Scale; if (value >= 0) return _lut[(int) (f + 0.5f)]; return 1.0f - _lut[(int) (0.5f - f)]; } static float TestError() { var emax = 0.0f; for (var x = -10.0f; x < 10.0f; x += 0.000001f) { var v0 = Sigmoid1(x); var v1 = Sigmoid2(x); var e = Math.Abs(v1 - v0); if (e > emax) emax = e; } return emax; } static double TestPerformancePlain() { var sw = new Stopwatch(); sw.Start(); for (int i = 0; i < 10; i++) { for (float x = -5.0f; x < 5.0f; x += 0.000001f) { Sigmoid1(x); } } return sw.Elapsed.TotalMilliseconds; } static double TestPerformanceOfLut() { var sw = new Stopwatch(); sw.Start(); for (int i = 0; i < 10; i++) { for (float x = -5.0f; x < 5.0f; x += 0.000001f) { Sigmoid2(x); } } return sw.Elapsed.TotalMilliseconds; } static void Main() { var emax = TestError(); var t0 = TestPerformancePlain(); var t1 = TestPerformanceOfLut(); Console.WriteLine("Max detected deviation using LUT: {0}", emax); Console.WriteLine("10^7 iterations using Sigmoid1() took {0} ms", t0); Console.WriteLine("10^7 iterations using Sigmoid2() took {0} ms", t1); } }
F# sigmoid algorithm benchmark
// (c) Rinat Abdullin // http://abdullin.com/journal/2009/1/6/f-has-better-performance-than-c-in-math.html // sigmoidfs.fs #light let Scale = 320.0f; let Resolution = 2047; let Min = -single(Resolution)/Scale; let Max = single(Resolution)/Scale; let range step a b = let count = int((b-a)/step); seq { for i in 0 .. count -> single(i)*step + a }; let lut = [| for x in 0 .. Resolution -> single(1.0/(1.0 + exp(-double(x)/double(Scale)))) |] let sigmoid1 value = 1.0f/(1.0f + exp(-value)); let sigmoid2 v = if (v <= Min) then 0.0f; elif (v>= Max) then 1.0f; else let f = v * Scale; if (v>0.0f) then lut.[int (f + 0.5f)] else 1.0f - lut.[int(0.5f - f)]; let getError f = let test = range 0.00001f -10.0f 10.0f; let errors = seq { for v in test -> abs(sigmoid1(single(v)) - f(single(v))) } Seq.max errors; open System.Diagnostics; let test f = let sw = Stopwatch.StartNew(); let mutable m = 0.0f; let result = for t in 1 .. 10 do for x in 1 .. 1000000 do m <- f(single(x)/100000.0f-5.0f); sw.Elapsed.TotalMilliseconds; printf "Max deviation is %f\n" (getError sigmoid2) printf "10^7 iterations using sigmoid1: %f ms\n" (test sigmoid1) printf "10^7 iterations using sigmoid2: %f ms\n" (test sigmoid2)
PS> csc -o -debug- .\sigmoidcs.cs Microsoft (R) Visual C# 2010 Compiler version 4.0.30319.1 Copyright (C) Microsoft Corporation. All rights reserved. PS> fsc --debug- --optimize+ .\sigmoidfs.fs Microsoft (R) F# 2.0 Compiler build 4.0.30319.1 Copyright (c) Microsoft Corporation. All Rights Reserved. PS> .\sigmoidcs.exe Max detected deviation using LUT: 0.001663984 10^7 iterations using Sigmoid1() took 2644.5935 ms 10^7 iterations using Sigmoid2() took 510.9379 ms PS> .\sigmoidfs.exe Max deviation is 0.001664 10^7 iterations using sigmoid1: 403.974300 ms 10^7 iterations using sigmoid2: 124.520100 ms
Wow. That’s 404ms for F# and 2,656 for C#. That’s a huge difference. Why?
We already established that the IL for the Sigmoid1() algorithm above compiles to very similar IL. What else could be different?
Wait a minute.
let result = for t in 1 .. 10 do for x in 1 .. 1000000 do m <- f(single(x)/100000.0f-5.0f);
The F# loop is not equivalent to the C# version. In the F# version the counter is an int and the C# version the counter is a float. (Also the F# version is passing a delegate to the test function.)
for (int i = 0; i < 10; i++) { for (float x = -5.0f; x < 5.0f; x += 0.000001f) { Sigmoid1(x); } }
Does that int counter thing make a big difference.
using System; using System.Diagnostics; static class App { static float Sigmoid1(double value) { return (float) (1.0/(1.0 + Math.Exp(-value))); } static double TestPerformancePlain() { var sw = new Stopwatch(); sw.Start(); float num = 0f; for (int i = 0; i < 10; i++) { for (int j = 1; j < 1000001; j++) { num = Sigmoid1((((float) j) / 100000f) - 5f); } } return sw.Elapsed.TotalMilliseconds; } static void Main() { var t0 = TestPerformancePlain(); Console.WriteLine("10^7 iterations using Sigmoid1() took {0} ms", t0); } }
PS> csc -o -debug- .\sigmoid-counter.cs Microsoft (R) Visual C# 2010 Compiler version 4.0.30319.1 Copyright (C) Microsoft Corporation. All rights reserved. PS> ./sigmoid-counter 10^7 iterations using Sigmoid1() took 431.6634 ms
Yup C# just executed that Sigmoid1() test in 431ms. Recall that F# was 404ms.
The other difference was that the F# code was passing a delegate of FsharpFunc<float,float> rather than making a method call. Lets try the same thing in C# using a Lambda expression to create an anonymous Func<float,float> delegate.
using System; using System.Diagnostics; static class App { static double Test(Func<float,float> f) { var sw = new Stopwatch(); sw.Start(); float num = 0f; for (int i = 0; i < 10; i++) { for (int j = 1; j < 1000001; j++) { num = f((((float) j) / 100000f) - 5f); } } return sw.Elapsed.TotalMilliseconds; } static void Main() { var t0 = Test( x => { return (float) (1.0/(1.0 + Math.Exp(-x))); } ); Console.WriteLine("10^7 iterations using Sigmoid1() took {0} ms", t0); } }
PS> ./sigmoid-counter 10^7 iterations using Sigmoid1() took 431.6634 ms PS> csc -o -debug- .\sigmoid-lambda.cs Microsoft (R) Visual C# 2010 Compiler version 4.0.30319.1 Copyright (C) Microsoft Corporation. All rights reserved. PS> .\sigmoid-lambda.exe 10^7 iterations using Sigmoid1() took 275.1087 ms
Now C# is 130ms faster than F#.
For what it’s worth the C version of the Sigmoid1() benchmark executes in 166ms on my machine using CL 16 x64 on my machine.
- C: 166ms
- C# (delegate): 275ms
- C# (method): 431ms
- C# (method, float counter): 2,656ms
- F#: 404ms
- Never, ever use a float as a counter in a for loop in C#.
- Invoking a delegate in the CLR is slightly faster than a normal method invocation.
- It’s very easy to measure something other than what you intended when benchmarking.
- In as much as this little test means anything, C# and F# have similar performance. C is still faster.