Compiling Mono 3.x from Source with LLVM Code Generation in Ubuntu LTS x64

The packages for Mono in Debian and Ubuntu are lagging far behind what is available from Xamarin for OS X, SUSE and OpenSUSE. The 2.10.x versions of Mono are .NET 2.0 runtime compatible and use Mono’s older Boem garbage collector — which tends to leak objects. The new generational “sgen” garbage collector is far superior All of the tutorials for building on Fedora/RHEL/CentOS and Debian/Ubuntu that I have seen build with the built-in code generation engine rather than LLVM. LLVM provides a significant speedup in the generated code of 20-30% but Mono can’t link against the version of LLVM that ships in Debian Wheezy or Ubuntu. You have to build a tweaked version from source — which is the tricky bit.

My goal was to build the same version of Mono that is shipped by Xamarin for OS X on Ubuntu 12.04 Precise Pangolin, including LLVM. That is mono 3.2.5 at the time of this writing.

Screen Shot 2013 12 01 at 10 58 51 AMPrerequisites

Mono is hosted on GitHub so we need git. It also requires GNU autotools, make and g++ to drive the build. 

sudo apt-get install libtool autoconf g++ gettext make git

We’re building something that isn’t managed by the system package manager, so according to the UNIX Filesystem Hierarchy Standard, the build product should go into the /opt file system. I’m going to use /opt/local and so before going any further let’s add that to the $PATH by editing /etc/environment with vi or your editor of choice:

PATH=“/opt/local/sbin:/opt/local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games”

Also, we need to change the current environment:

export PATH=/opt/local/sbin:/opt/local/bin:$PATH

Libgdiplus

The first component of Mono that we need to build from source is libgdiplus. Libgdiplus is an open source implementation of the Windows GDI+ API for graphics and text rendering. Libgdiplus has a number of prerequisites because it basically just maps GDI+ function calls onto equivalent Linux libraries:

sudo apt-get install pkg-config libglib2.0-dev libpng12-dev libx11-dev libfreetype6-dev libfontconfig1-dev libtiff-dev libjpeg8-dev libgif-dev libexif-dev

mkdir ~/src; mdkr ~/src/mono; cd ~/src/mono

git clone https://github.com/mono/libgdiplus.git

cd libgdiplus

./configure –prefix=/opt/local

make

sudo make install

 

Mono-LLVM

Mono has a tweaked fork of LLVM. By default it will try to build x86 and x86_64 targets on Ubuntu x64 but the build will fail because there is no x86 runtime support installed.

cd ~/src/mono

git clone https://github.com/mono/llvm.git

cd llvm

git checkout mono

./configure –prefix=/opt/local –enable-optimized –enable-targets=x86_64

make

sudo make install

Mono with LLVM

Each official release of Mono is tagged in git. You should checkout the version that you want to build. In this case, I’m building Mono 3.2.5 with the LLVM and libgdiplus libraries we already built. Since there is no working version of Mono on the system, the very first time you build you need to do make get-monolite-latest which will pull down a minimal mono-based C# compiler to boostrap the build.

cd ~/src/mono

git clone https://github.com/mono/mono.git

cd mono

git checkout mono-3.2.5

export MONO_USE_LLVM=1

./autogen.sh –prefix=/opt/local –enable-llvm=yes –with-sgen=yes –with-gc=sgen

make get-monolite-latest

make EXTERNAL_MCS=${PWD}/mcs/class/lib/monolite/gmcs.exe

sudo make install

Assuming that all went well, you should now have a working build of mono 3.2.5 (or newer) in /opt/local/bin/ and executing mono –version should include LLVM: yes.

Depending on what you want to do with Mono, you may now want to build XSP (ASP.NET server, Apache module and FastCGI module for nginx) and/or F# 3.x.

XSP

The xsp2 server uses a .NET Framework 2.0/3.0/3.5 API and the xsp4 server provides a 4/4.5 API. The main tweak here is to set the PKG_CONFIG_PATH environmental variable so that the configure script can find mono in /opt/local.

cd ~/src/mono

git clone https://github.com/mono/xsp.git

cd xsp

git checkout 3.0.11

PKG_CONFIG_PATH=/opt/local/lib/pkgconfig ./autogen.sh –prefix=/opt/local

make

sudo make install

F# 3.1

cd ~/src/mono

git clone https://github.com/fsharp/fsharp

cd fsharp

git checkout fsharp_31

./autogen.sh –prefix=/opt/local

make

sudo make install

C# in Depth 2nd Edition for Kindle but not from Amazon

skeet2_cover150Jon Skeet blogged this morning that C# in Depth 2nd edition is now available on Kindle. The original C# in Depth is just about the best programming book I’ve ever read, so I will definitely check this out. Don’t look on the Amazon Kindle store, though. Manning is self-publishing eBooks through its own website in PDF, epub and mobi. Mobi is the un-DRMed proto-format of Kindle and is read by Kindle devices and the Kindle desktop software. Interesting.

petricek_cover150Before I get C# in Depth 2nd Edition, though, I think I’ll check out Real-World Functional Programming by Tomas Petricek and Jon Skeet. This book is about both F# and C# (surely mostly LINQ). I’ve been playing around with Haskell lately and while it is interesting, it doesn’t seem all that practical. F# seems like a more approachable language and it targets .NET/Mono so it can mix with C# code that already exists and generally seems likely to be more immediately practical for real-world projects.

Also, if you have never heard of Jon Skeet, he’s an over-achiever who knows more about C# than just anyone not on the compiler team at Microsoft and he has by far the highest score on Stack Overflow. I don’t know how he has time for all of this because his day job has him working at Google in Java on the Android Market. He was interviewed on This Devleoper’s Life 1.1.3. Give it a listen.

Is F# Math Really Faster than C#?

I ran across an article claiming that F# was about an order of magnitude faster at calculating a basic math algorithm. Can this be true?

Sigmoid in C#

public static float Sigmoid(float value) 
{
    return 1.0f / (1.0f + (float) Math.Exp(-value));
}
//csc -o -debug-
.method public static float32 Sigmoid(float32 'value') cil managed
{
    .maxstack 8
    L_0000: ldc.r8 1
    L_0009: ldc.r8 1
    L_0012: ldarg.0 
    L_0013: neg 
    L_0014: conv.r8 
    L_0015: call float64 [mscorlib]System.Math::Exp(float64)
    L_001a: add 
    L_001b: div 
    L_001c: conv.r4 
    L_001d: ret 
}

Sigmoid in F#

let Sigmoid value = 1.0f/(1.0f + exp(-value));
//fsc --debug- --optimize+
.method public static float32 Sigmoid(float32 'value') cil managed
{
    .maxstack 5
    .locals init (
        [0] float32 num)
    L_0000: ldc.r4 1
    L_0005: ldc.r4 1
    L_000a: ldarg.0 
    L_000b: neg 
    L_000c: stloc.0 
    L_000d: ldloc.0 
    L_000e: conv.r8 
    L_000f: call float64 [mscorlib]System.Math::Exp(float64)
    L_0014: conv.r4 
    L_0015: add 
    L_0016: div 
    L_0017: ret 
}

The IL generated is nearly the same. The F# version allocates a little less memory using float32 variables on the stack where C# uses float64 and generally manages with a smaller stack, but the basic math operations are the same. Can these differences make an order of magnitude performance difference? Short answer is no.

Benchmarks are Tricky

C# sigmoid algorithm benchmark

// (c) Rinat Abdullin
// http://abdullin.com/journal/2009/1/5/caching-activation-function-is-not-worth-it.html
// sigmoidcs.cs

using System;
using System.Diagnostics;

static class App
{
	const float Scale = 320.0f;
	const int Resolution = 2047;
	const float Min = -Resolution/Scale;
	const float Max = Resolution/Scale;

	static readonly float[] _lut = InitLut();

	static float[] InitLut()
	{
	  var lut =  new float[Resolution + 1];
	  for (int i = 0; i < Resolution + 1; i++)
	  {
	    lut[i] = (float) (1.0/(1.0 + Math.Exp(-i/Scale)));
	  }
	  return lut;
	}

	static float Sigmoid1(double value)
	{
	  return (float) (1.0/(1.0 + Math.Exp(-value)));
	}

	static float Sigmoid2(float value)
	{
	  if (value <= Min) return 0.0f;
	  if (value >= Max) return 1.0f;
	  var f = value*Scale;
	  if (value >= 0) return _lut[(int) (f + 0.5f)];
	  return 1.0f - _lut[(int) (0.5f - f)];
	}

	static float TestError()
	{
	  var emax = 0.0f;
	  for (var x = -10.0f; x < 10.0f; x += 0.000001f)
	  {
	    var v0 = Sigmoid1(x);
	    var v1 = Sigmoid2(x);

	    var e = Math.Abs(v1 - v0);
	    if (e > emax) emax = e;
	  }
	  return emax;
	}

	static double TestPerformancePlain()
	{
	  var sw = new Stopwatch();
	  sw.Start();
	  for (int i = 0; i < 10; i++)
	  {
	    for (float x = -5.0f; x < 5.0f; x += 0.000001f)
	    {
	      Sigmoid1(x);
	    }
	  }

	  return sw.Elapsed.TotalMilliseconds;
	}

	static double TestPerformanceOfLut()
	{
	  var sw = new Stopwatch();
	  sw.Start();
	  for (int i = 0; i < 10; i++)
	  {
	    for (float x = -5.0f; x < 5.0f; x += 0.000001f)
	    {
	      Sigmoid2(x);
	    }
	  }
	  return sw.Elapsed.TotalMilliseconds;
	}

	static void Main()
	{
	  var emax = TestError();
	  var t0 = TestPerformancePlain();
	  var t1 = TestPerformanceOfLut();

	  Console.WriteLine("Max detected deviation using LUT: {0}", emax);
	  Console.WriteLine("10^7 iterations using Sigmoid1() took {0} ms", t0);
	  Console.WriteLine("10^7 iterations using Sigmoid2() took {0} ms", t1);
	}
}

F# sigmoid algorithm benchmark

// (c) Rinat Abdullin
// http://abdullin.com/journal/2009/1/6/f-has-better-performance-than-c-in-math.html
// sigmoidfs.fs

#light

let Scale = 320.0f;
let Resolution = 2047;

let Min = -single(Resolution)/Scale;
let Max = single(Resolution)/Scale;

let range step a b =
  let count = int((b-a)/step);
  seq { for i in 0 .. count -> single(i)*step + a };

let lut = [| 
  for x in 0 .. Resolution ->
    single(1.0/(1.0 +  exp(-double(x)/double(Scale))))
  |]

let sigmoid1 value = 1.0f/(1.0f + exp(-value));

let sigmoid2 v = 
  if (v <= Min) then 0.0f;
  elif (v>= Max) then 1.0f;
  else
    let f = v * Scale;
    if (v>0.0f) then lut.[int (f + 0.5f)]
    else 1.0f - lut.[int(0.5f - f)];

let getError f = 
  let test = range 0.00001f -10.0f 10.0f;
  let errors = seq { 
    for v in test -> 
      abs(sigmoid1(single(v)) - f(single(v)))
  }
  Seq.max errors;

open System.Diagnostics;

let test f = 
  let sw = Stopwatch.StartNew(); 
  let mutable m = 0.0f;
  let result = 
    for t in 1 .. 10 do
      for x in 1 .. 1000000 do
        m <- f(single(x)/100000.0f-5.0f);
  sw.Elapsed.TotalMilliseconds;

printf "Max deviation is %f\n" (getError sigmoid2)
printf "10^7 iterations using sigmoid1: %f ms\n" (test sigmoid1)
printf "10^7 iterations using sigmoid2: %f ms\n" (test sigmoid2)
PS> csc -o -debug- .\sigmoidcs.cs
Microsoft (R) Visual C# 2010 Compiler version 4.0.30319.1
Copyright (C) Microsoft Corporation. All rights reserved.

PS> fsc --debug- --optimize+  .\sigmoidfs.fs
Microsoft (R) F# 2.0 Compiler build 4.0.30319.1
Copyright (c) Microsoft Corporation. All Rights Reserved.
PS> .\sigmoidcs.exe
Max detected deviation using LUT: 0.001663984
10^7 iterations using Sigmoid1() took 2644.5935 ms
10^7 iterations using Sigmoid2() took 510.9379 ms
PS> .\sigmoidfs.exe
Max deviation is 0.001664
10^7 iterations using sigmoid1: 403.974300 ms
10^7 iterations using sigmoid2: 124.520100 ms

Wow. That’s 404ms for F# and 2,656 for C#. That’s a huge difference. Why?

We already established that the IL for the Sigmoid1() algorithm above compiles to very similar IL. What else could be different?

fsharp-test-func

Wait a minute.

let result = 
    for t in 1 .. 10 do
      for x in 1 .. 1000000 do
        m <- f(single(x)/100000.0f-5.0f);

The F# loop is not equivalent to the C# version. In the F# version the counter is an int and the C# version the counter is a float. (Also the F# version is passing a delegate to the test function.)

for (int i = 0; i < 10; i++)
{
	for (float x = -5.0f; x < 5.0f; x += 0.000001f)
	{
		Sigmoid1(x);
	}
}

Does that int counter thing make a big difference.

using System;
using System.Diagnostics;

static class App
{
	static float Sigmoid1(double value)
	{
	  return (float) (1.0/(1.0 + Math.Exp(-value)));
	}

	static double TestPerformancePlain()
	{
		var sw = new Stopwatch();
		sw.Start();
		float num = 0f;
		for (int i = 0; i < 10; i++)
		{
			for (int j = 1; j < 1000001; j++)
			{
				num = Sigmoid1((((float) j) / 100000f) - 5f);
			}
		}  
		return sw.Elapsed.TotalMilliseconds;
	}

	static void Main()
	{
	  var t0 = TestPerformancePlain();
	  Console.WriteLine("10^7 iterations using Sigmoid1() took {0} ms", t0);
	}
}
PS> csc -o -debug- .\sigmoid-counter.cs
Microsoft (R) Visual C# 2010 Compiler version 4.0.30319.1
Copyright (C) Microsoft Corporation. All rights reserved.

PS> ./sigmoid-counter
10^7 iterations using Sigmoid1() took 431.6634 ms

Yup C# just executed that Sigmoid1() test in 431ms. Recall that F# was 404ms.

The other difference was that the F# code was passing a delegate of FsharpFunc<float,float> rather than making a method call. Lets try the same thing in C# using a Lambda expression to create an anonymous Func<float,float> delegate.

using System;
using System.Diagnostics;

static class App
{
	static double Test(Func<float,float> f)
	{
		var sw = new Stopwatch();
		sw.Start();
		float num = 0f;
		for (int i = 0; i < 10; i++)
		{
			for (int j = 1; j < 1000001; j++)
			{
		   		num = f((((float) j) / 100000f) - 5f);
			}
		}  
		return sw.Elapsed.TotalMilliseconds;
	}

	static void Main()
	{
	  var t0 = Test( x => { return (float) (1.0/(1.0 + Math.Exp(-x))); } );
	  Console.WriteLine("10^7 iterations using Sigmoid1() took {0} ms", t0);
	}
}
PS> ./sigmoid-counter
10^7 iterations using Sigmoid1() took 431.6634 ms
PS> csc -o -debug- .\sigmoid-lambda.cs
Microsoft (R) Visual C# 2010 Compiler version 4.0.30319.1
Copyright (C) Microsoft Corporation. All rights reserved.

PS> .\sigmoid-lambda.exe
10^7 iterations using Sigmoid1() took 275.1087 ms

Now C# is 130ms faster than F#.

For what it’s worth the C version of the Sigmoid1() benchmark executes in 166ms on my machine using CL 16 x64 on my machine.

  • C: 166ms
  • C# (delegate): 275ms
  • C# (method): 431ms
  • C# (method, float counter): 2,656ms
  • F#: 404ms
  • So, what have we learned?

  • Never, ever use a float as a counter in a for loop in C#.
  • Invoking a delegate in the CLR is slightly faster than a normal method invocation.
  • It’s very easy to measure something other than what you intended when benchmarking.
  • In as much as this little test means anything, C# and F# have similar performance. C is still faster.
%d bloggers like this: