Is F# Math Really Faster than C#?

I ran across an article claiming that F# was about an order of magnitude faster at calculating a basic math algorithm. Can this be true?

Sigmoid in C#

public static float Sigmoid(float value) 
{
    return 1.0f / (1.0f + (float) Math.Exp(-value));
}
//csc -o -debug-
.method public static float32 Sigmoid(float32 'value') cil managed
{
    .maxstack 8
    L_0000: ldc.r8 1
    L_0009: ldc.r8 1
    L_0012: ldarg.0 
    L_0013: neg 
    L_0014: conv.r8 
    L_0015: call float64 [mscorlib]System.Math::Exp(float64)
    L_001a: add 
    L_001b: div 
    L_001c: conv.r4 
    L_001d: ret 
}

Sigmoid in F#

let Sigmoid value = 1.0f/(1.0f + exp(-value));
//fsc --debug- --optimize+
.method public static float32 Sigmoid(float32 'value') cil managed
{
    .maxstack 5
    .locals init (
        [0] float32 num)
    L_0000: ldc.r4 1
    L_0005: ldc.r4 1
    L_000a: ldarg.0 
    L_000b: neg 
    L_000c: stloc.0 
    L_000d: ldloc.0 
    L_000e: conv.r8 
    L_000f: call float64 [mscorlib]System.Math::Exp(float64)
    L_0014: conv.r4 
    L_0015: add 
    L_0016: div 
    L_0017: ret 
}

The IL generated is nearly the same. The F# version allocates a little less memory using float32 variables on the stack where C# uses float64 and generally manages with a smaller stack, but the basic math operations are the same. Can these differences make an order of magnitude performance difference? Short answer is no.

Benchmarks are Tricky

C# sigmoid algorithm benchmark

// (c) Rinat Abdullin
// http://abdullin.com/journal/2009/1/5/caching-activation-function-is-not-worth-it.html
// sigmoidcs.cs

using System;
using System.Diagnostics;

static class App
{
	const float Scale = 320.0f;
	const int Resolution = 2047;
	const float Min = -Resolution/Scale;
	const float Max = Resolution/Scale;

	static readonly float[] _lut = InitLut();

	static float[] InitLut()
	{
	  var lut =  new float[Resolution + 1];
	  for (int i = 0; i < Resolution + 1; i++)
	  {
	    lut[i] = (float) (1.0/(1.0 + Math.Exp(-i/Scale)));
	  }
	  return lut;
	}

	static float Sigmoid1(double value)
	{
	  return (float) (1.0/(1.0 + Math.Exp(-value)));
	}

	static float Sigmoid2(float value)
	{
	  if (value <= Min) return 0.0f;
	  if (value >= Max) return 1.0f;
	  var f = value*Scale;
	  if (value >= 0) return _lut[(int) (f + 0.5f)];
	  return 1.0f - _lut[(int) (0.5f - f)];
	}

	static float TestError()
	{
	  var emax = 0.0f;
	  for (var x = -10.0f; x < 10.0f; x += 0.000001f)
	  {
	    var v0 = Sigmoid1(x);
	    var v1 = Sigmoid2(x);

	    var e = Math.Abs(v1 - v0);
	    if (e > emax) emax = e;
	  }
	  return emax;
	}

	static double TestPerformancePlain()
	{
	  var sw = new Stopwatch();
	  sw.Start();
	  for (int i = 0; i < 10; i++)
	  {
	    for (float x = -5.0f; x < 5.0f; x += 0.000001f)
	    {
	      Sigmoid1(x);
	    }
	  }

	  return sw.Elapsed.TotalMilliseconds;
	}

	static double TestPerformanceOfLut()
	{
	  var sw = new Stopwatch();
	  sw.Start();
	  for (int i = 0; i < 10; i++)
	  {
	    for (float x = -5.0f; x < 5.0f; x += 0.000001f)
	    {
	      Sigmoid2(x);
	    }
	  }
	  return sw.Elapsed.TotalMilliseconds;
	}

	static void Main()
	{
	  var emax = TestError();
	  var t0 = TestPerformancePlain();
	  var t1 = TestPerformanceOfLut();

	  Console.WriteLine("Max detected deviation using LUT: {0}", emax);
	  Console.WriteLine("10^7 iterations using Sigmoid1() took {0} ms", t0);
	  Console.WriteLine("10^7 iterations using Sigmoid2() took {0} ms", t1);
	}
}

F# sigmoid algorithm benchmark

// (c) Rinat Abdullin
// http://abdullin.com/journal/2009/1/6/f-has-better-performance-than-c-in-math.html
// sigmoidfs.fs

#light

let Scale = 320.0f;
let Resolution = 2047;

let Min = -single(Resolution)/Scale;
let Max = single(Resolution)/Scale;

let range step a b =
  let count = int((b-a)/step);
  seq { for i in 0 .. count -> single(i)*step + a };

let lut = [| 
  for x in 0 .. Resolution ->
    single(1.0/(1.0 +  exp(-double(x)/double(Scale))))
  |]

let sigmoid1 value = 1.0f/(1.0f + exp(-value));

let sigmoid2 v = 
  if (v <= Min) then 0.0f;
  elif (v>= Max) then 1.0f;
  else
    let f = v * Scale;
    if (v>0.0f) then lut.[int (f + 0.5f)]
    else 1.0f - lut.[int(0.5f - f)];

let getError f = 
  let test = range 0.00001f -10.0f 10.0f;
  let errors = seq { 
    for v in test -> 
      abs(sigmoid1(single(v)) - f(single(v)))
  }
  Seq.max errors;

open System.Diagnostics;

let test f = 
  let sw = Stopwatch.StartNew(); 
  let mutable m = 0.0f;
  let result = 
    for t in 1 .. 10 do
      for x in 1 .. 1000000 do
        m <- f(single(x)/100000.0f-5.0f);
  sw.Elapsed.TotalMilliseconds;

printf "Max deviation is %f\n" (getError sigmoid2)
printf "10^7 iterations using sigmoid1: %f ms\n" (test sigmoid1)
printf "10^7 iterations using sigmoid2: %f ms\n" (test sigmoid2)
PS> csc -o -debug- .\sigmoidcs.cs
Microsoft (R) Visual C# 2010 Compiler version 4.0.30319.1
Copyright (C) Microsoft Corporation. All rights reserved.

PS> fsc --debug- --optimize+  .\sigmoidfs.fs
Microsoft (R) F# 2.0 Compiler build 4.0.30319.1
Copyright (c) Microsoft Corporation. All Rights Reserved.
PS> .\sigmoidcs.exe
Max detected deviation using LUT: 0.001663984
10^7 iterations using Sigmoid1() took 2644.5935 ms
10^7 iterations using Sigmoid2() took 510.9379 ms
PS> .\sigmoidfs.exe
Max deviation is 0.001664
10^7 iterations using sigmoid1: 403.974300 ms
10^7 iterations using sigmoid2: 124.520100 ms

Wow. That’s 404ms for F# and 2,656 for C#. That’s a huge difference. Why?

We already established that the IL for the Sigmoid1() algorithm above compiles to very similar IL. What else could be different?

fsharp-test-func

Wait a minute.

let result = 
    for t in 1 .. 10 do
      for x in 1 .. 1000000 do
        m <- f(single(x)/100000.0f-5.0f);

The F# loop is not equivalent to the C# version. In the F# version the counter is an int and the C# version the counter is a float. (Also the F# version is passing a delegate to the test function.)

for (int i = 0; i < 10; i++)
{
	for (float x = -5.0f; x < 5.0f; x += 0.000001f)
	{
		Sigmoid1(x);
	}
}

Does that int counter thing make a big difference.

using System;
using System.Diagnostics;

static class App
{
	static float Sigmoid1(double value)
	{
	  return (float) (1.0/(1.0 + Math.Exp(-value)));
	}

	static double TestPerformancePlain()
	{
		var sw = new Stopwatch();
		sw.Start();
		float num = 0f;
		for (int i = 0; i < 10; i++)
		{
			for (int j = 1; j < 1000001; j++)
			{
				num = Sigmoid1((((float) j) / 100000f) - 5f);
			}
		}  
		return sw.Elapsed.TotalMilliseconds;
	}

	static void Main()
	{
	  var t0 = TestPerformancePlain();
	  Console.WriteLine("10^7 iterations using Sigmoid1() took {0} ms", t0);
	}
}
PS> csc -o -debug- .\sigmoid-counter.cs
Microsoft (R) Visual C# 2010 Compiler version 4.0.30319.1
Copyright (C) Microsoft Corporation. All rights reserved.

PS> ./sigmoid-counter
10^7 iterations using Sigmoid1() took 431.6634 ms

Yup C# just executed that Sigmoid1() test in 431ms. Recall that F# was 404ms.

The other difference was that the F# code was passing a delegate of FsharpFunc<float,float> rather than making a method call. Lets try the same thing in C# using a Lambda expression to create an anonymous Func<float,float> delegate.

using System;
using System.Diagnostics;

static class App
{
	static double Test(Func<float,float> f)
	{
		var sw = new Stopwatch();
		sw.Start();
		float num = 0f;
		for (int i = 0; i < 10; i++)
		{
			for (int j = 1; j < 1000001; j++)
			{
		   		num = f((((float) j) / 100000f) - 5f);
			}
		}  
		return sw.Elapsed.TotalMilliseconds;
	}

	static void Main()
	{
	  var t0 = Test( x => { return (float) (1.0/(1.0 + Math.Exp(-x))); } );
	  Console.WriteLine("10^7 iterations using Sigmoid1() took {0} ms", t0);
	}
}
PS> ./sigmoid-counter
10^7 iterations using Sigmoid1() took 431.6634 ms
PS> csc -o -debug- .\sigmoid-lambda.cs
Microsoft (R) Visual C# 2010 Compiler version 4.0.30319.1
Copyright (C) Microsoft Corporation. All rights reserved.

PS> .\sigmoid-lambda.exe
10^7 iterations using Sigmoid1() took 275.1087 ms

Now C# is 130ms faster than F#.

For what it’s worth the C version of the Sigmoid1() benchmark executes in 166ms on my machine using CL 16 x64 on my machine.

  • C: 166ms
  • C# (delegate): 275ms
  • C# (method): 431ms
  • C# (method, float counter): 2,656ms
  • F#: 404ms
  • So, what have we learned?

  • Never, ever use a float as a counter in a for loop in C#.
  • Invoking a delegate in the CLR is slightly faster than a normal method invocation.
  • It’s very easy to measure something other than what you intended when benchmarking.
  • In as much as this little test means anything, C# and F# have similar performance. C is still faster.
Advertisements

Chrome MSI Works Great with AppLocker

Google has released a version of the Chrome installer packaged as an MSI rather than using the ClickOnce installer. The major difference is that the MSI creates a global installation under %ProgramFiles(x86)%. Transparent updates continue by default, managed by the Google Chrome Update Service (gupdate). Gupdate also updates Google Apps Sync for Outlook if it is installed. The key advantage of an MSI package is that it is compatible with managed deployment using Active Directory and Google has provided a set of policy templates to allow managing Chrome via Group Policy in the same way that Internet Explorer is managed.

Because all of the code is installed in %ProgramFiles(x86)%, Chrome is fully compatible with the default AppLocker EXE and DLL rules even though the bundled Flash DLL is not signed. “Chrome Enterprise” runs fine with the default rule set and no special publisher rules at all.

Chrome MSI download here.

Policy templates here.

via Chromium Blog.

Zenburn Theme for EmEditor

I have used EmEditor as my text editor on Windows for several years. It is extremely fast and has some nice features like spell check, column select, a data-separated values columnar mode. It handles every text format imaginable. There are grammar files for syntax highlighting available for a huge number of languages.

It doesn’t come out-of-the-box with a dark color scheme that I really enjoy, so I have put together my favorite Zenburn color scheme.

emeditor

Zenburn.eetheme

[Zenburn]
MaxFind=3
Normal=#fcffe0,#3e3e3e,normal
Sel=#cddc8f,#2e4340,normal
CurrentLine=transparent,transparent,normal
Quoted=#008080,transparent,normal
Find=#f8f8f8,#228000,normal
URL=#7efcff,transparent,underline
Mail=#60db5d,transparent,underline
Tag=#808000,transparent,underline
SingleQuotes=#eb939a,transparent,normal
DoubleQuotes=#eb939a,transparent,normal
Comment=#7f9f7f,transparent,italic
Script=#a89100,transparent,normal
Braces=#ffb010,transparent,bold
InTag=#f0dfaf,transparent,normal
Highlight1=#ff80e1,transparent,bold
Highlight2=#c27e66,transparent,bold
Highlight3=#ffd7a7,transparent,bold
Highlight4=#9fc28a,transparent,bold
Highlight5=#afc4db,transparent,normal
Highlight6=#c6c4e6,transparent,normal
Highlight7=#f2dfa4,transparent,normal
Highlight8=#333333,#f18c96,normal
Highlight9=#242424,#d0d0a0,normal
Highlight10=#233323,#71d3b4,normal
Return=#0080ff,transparent,normal
Line=#c0c0c0,transparent,normal
PageBreak=#fcffe0,transparent,normal
LineNumber=#83a98a,#3e3e3e,normal
Ruler=#9fafaf,#3e3e3e,normal
Outside=#fcffe0,#3e3e3e,normal
CompareChange=#f8f8f8,#005baa,normal
CompareChar=#fcffe0,#800f00,normal
CompareAdded=#000000,#ffe0e0,normal
CompareDeleted=#000000,#e0f0ff,normal
CompareBlank=#000000,#e4e4e4,normal
Spell=transparent,transparent,wiggly
Unicode=transparent,#ff0000,normal
VerticalSel=#c0c0c0,transparent,normal
HexSel=#c0c0c0,transparent,normal
IndentGuides=#ff8040,transparent,dotted
HorzGrid=#0080ff,transparent,dotted
Outline=#8000ff,transparent,dotted
LineNumberLines=#000000,transparent,normal
RulerLines=#000000,transparent,normal
Find-2=transparent,#40ff40,normal
Find-3=transparent,#40ff40,normal

Install Zenburn theme in EmEditor:

  • Save the above as a text file called zenburn.eetheme
  • Tools | Properties for All Configurations
  • select the “Display” tab
  • Push the “>” button to the right of the Theme select box
  • Choose Import…
  • Pick the zenburn.eetheme file
  • Enjoy.

Updated 01/13/2011 to fix find coloration.

Really Least-Privilege Development: AppLocker and Visual Studio

AppLocker is a software execution policy tool in Windows 7 Enterprise and Ultimate and Windows Server 2008 R2. An AppLocker policy can be used to shift Windows from a  model where execution of code is permitted by default to a model where execution is denied by default. AppLocker is aware of binary exe, DLL/OCX, the scripting engines that ship with Windows and Windows Installer packages. The default rule sets for these categories will only allow code that is installed into the system or program files directories to execute. Once AppLocker is turned on, execution of code is denied by default and an unprivileged user cannot add executable code to the system.

If untrusted users can’t execute new code, then how can Visual Studio possibly work without making developers admin?

Grant Execute on Source/Build Tree: Fail

I thought this would be super simple and that all I would have to do was create Executable, DLL and Script path rules to grant execute on my source tree. At first, it seemed to work and I was off and running. Then I tried to build a big complicated project and the build failed all of a sudden. This solution had post-build rules but so what? I had script enabled for the build tree.

Build Actions Fail

Procmon shows that the pre- and post-build events are implemented as temporary batch scripts in %TEMP%. They are named <cryptic-number>.exec.cmd.

procmon-postbuild

Unfortunately %TEMP% is not a usable macro in AppLocker. You have to either create a generic Script rule to allow all *.exec.cmd scripts to execute or create rules for each Visual Studio user like C:\Users\<username>\AppData\Local\Temp\*.exec.cmd. Either way, post-build actions will start to work.

Web Apps Crash with Yellow Screen of Death

Another issue is that running Web apps with the built-in Visual Studio Development Web Server (aka Cassini) fails miserably. The obvious clue that this is AppLocker is “The program was blocked by group policy.”

webdev.webserver20-yellowdeath

The problem is that Cassini copies the DLLs and runs them from a subdirectory of %TEMP%\Temporary ASP.NET Files\.

webdev.webserver20-temp

 

In order for Cassini to work, you have to disable DLL rules or  create a DLL allow path rule for every developer in the form C:\Users\<username>\AppData\Local\Temp\Temporary ASP.NET Files\*.dll.

Visual Studio’s HelpLibAgent.exe Crashes

This one is a bit weirder and more surprising. Visual Studio 2010 has a new help system that operates as a local HTTP server. Invoking help with AppLocker DLL rules enabled generates a serious crash.

helplibagent-crash

I’m not sure why it does this but HelpLibAgent.exe generates a random string and then two .cs files based on that string and invokes the C# compiler to generate a DLL based on the random string which is dynamically loaded by HelpLibAgent. This seems weird on the face of it that and there’s nothing in that code that looks like it has to be generated on a per-user basis at all. Weird, weird, weird.

helplibagent-dll-compile

In order for this to work you have to allow any randomly named dll to load out of %TEMP% which means disabling DLL rules or modifying the rule that was necessary for Cassini:

C:\Users\<username>\AppData\Local\Temp\*.dll.

Summary

In order to run Visual Studio with AppLocker a user needs the following rules:

  • DLL, EXE and Script: Allow path on source tree / build directory structure
  • Script: Allow path on %TEMP%\*.cmd.exec
  • DLL: Allow path on %TEMP%\*.dll
  • Unfortunately %TEMP% is not available in AppLocker so a C:\Users\<username>\AppData\Local\Temp\* for every <username> needed has to exist. These are probably best implemented as local policies.
  • Optional: Allow script *.ps1. (This is pretty safe because PowerShell has its own tight script execution security model.)
  • It’s unfortunate that DLL rules have to be enabled for a well-known location like %TEMP% but that still doesn’t make the DLL rule useless.

    • OCX is still not permitted from %TEMP%
    • AppLocker DLL rules are complementary to CWDIllegalInDllSearch for mitigating DLL Hijacking because it provides a more granular options. This is particularly important if you need to use a global CWDIllegalInDllSearch setting of 1 or 2 for compatibility reasons.
    Once these rules are in place, the experience is seamless. The rules don’t get in the way of anything.
    Note that AppLocker script rules only apply to the scripting hosts that ship with Windows: CMD, Windows Scripting Host (.vbs and .js) and PowerShell). Perl, Python, Ruby, etc interpreters are not affected by AppLocker policy. Similarly, execution of Java jar files are not affected by AppLocker.
    It would be nice if DLL rules were a little smarter. For instance, I would like to be able to allow managed DLLs on some path but not native code.

WorkAround: Console2 + Telnet.exe on Windows 7 Crashes ConHost.exe

conshost-croak

Entering the Windows telnet.exe prompt mode causes the shell process underneath Console2 to crash on Windows 7. You can reproduce this by just invoking telnet with no arguments in a Console2 session. An alternate route to hit this is exiting a telnet session with CTRL+].

The symptom is Console2 freezing because it is no longer getting text from its slaved native console followed up with a crash dialog for conhost.exe and whetever your shell was.

Unhandled exception at 0xffdd1df9 in conhost.exe: 0xC0000005: Access violation reading location 0x0000000003385460.>    conhost.exe!SB_StreamWriteToScreenBuffer()  + 0x61 bytes   

A workaround is to launch telnet into a separate conhost.exe window.

#force telnet.exe to start in its own console window. 
function telnet { start $env:SystemRoot\System32\telnet.exe $args }

Or use putty –P port –telnet host as an alternative to telnet.exe. (I can’t get plink.exe to be interactive with telnet for me.)

Update

Unless you need something nifty that telnet.exe is doing, a more elegant solution is to Lee Holmes’ Connect-Computer.ps1 PowerShell script as a telnet replacement.

Headbang: Chrome, Flash and AppLocker Don’t Play Together

Google Chrome has bundled  a private copy of Flash since 5.0.375.86. Flash is packaged as gcswf32.dll in the Chrome application folder. All of the EXE and DLL files that come with Chrome as cryptographically signed by Google except for gcswf32.dll which is not signed at all. However, the NSAPI compatible Flash plugin distributed by Adobe is signed by Adobe.

This matters if you are using AppLocker DLL rules to define execution rules because there is no good way to create a permit rule for this DLL. AppLocker is a component of Windows 7 Enterprise and Ultimate SKUs which provides a straightforward mechanism for configuring Windows with a usable default-deny execution policy.

If you enable AppLocker EXE and DLL rules with the default rule set, Chrome will not run at all because Chrome is installed into the %userprofile%\AppData\AppData\Google\Chrome directory structure which is writable by unprivileged users. The obvious solution is to create EXE and DLL rules to permit code published by Google to execute. This works great, except that Flash will not execute. Why oh why is gcswf32.dll not signed by either Google or Adobe? There’s no way to whitelist this DLL except creating a permit *\gcswf32.dll rule or giving up on DLL rules altogether.

Workaround

A viable workaround is to use about:plugins to disable the Flash plugin Google distributes and globally install the “Flash player for Firefox, Mozilla, Netscape, Opera (and other plugin-based browsers)”. This plugin gets installed into C:\Windows\SysWOW64\Macromed\Flash on x64 or C:\Windows\system32\Macromed\Flash on x86 (and is signed by Adobe) is whitelisted by the default DLL rule set because it is inside of the Windows directory.

applocker-exeapplocker-dlldisable-chrome-bundled-flash

Chrome Team, Please Fix Me

You have to disable the bundled Flash player bundled by Chrome in order for Flash to work under this scenario, which means this is a workaround that is not enterprise ready unless your company is full of nerds. Also, this means that I have to worry about updating Flash instead of Chrome just handling it.

AppLocker is good stuff. Chrome is good stuff. I’d these technologies to play better together.

All Google needs to do is distribute a signed version of the Flash player plugin. Why aren’t they already doing this, anyway?

Please Chrome team, this is trivial to fix. Thanks in advance.

Reported as issue 65322.

Edit

A reasonable solution is to use the Chrome enterprise MSI installer which installs Chrome into the standard “program files'” location which means the default AppLocker permit rule on “program files” covers Chrome.

Update

I’m not sure when Google started distributing signed versions of Flash but in Chrome 12, gcswf32.dll comes with a valid signature from Adobe.

%d bloggers like this: