Saturday, June 30, 2012

Dynamic Expression Evaluation In .NET

Many applications need to evaluate expressions at run-time. This can be done using a dynamic expression evaluator. Examples of such applications are data analysis applications that perform calculations from user-defined expressions, spreadsheet software like Excel which allows users to define formulae, etc.

There are several ways of doing it in .NET, each one having its own pros and cons. This post covers the various approaches and at the end compares their performance along with code examples for each one of them. These approaches can be categorized as follows

Parsing

One of the most common ways of achieving this is by writing a parser. The parser can parse the input expression, generate data structures and eventually evaluate the sub-expressions to find the result of the expression.

Developers can either write their parser or use one of the already available parsers. The biggest challenge with developers writing their own parser is that building a parser is quite complicated and requires a lot of coding effort. The advantage of using parsers for expression evaluation is that it doesn't require any external processes and the implementation can be pure managed .NET code.

Here is an expression evaluator from Code Project written in 100% Managed .NET. As the article points out, the expression goes through various phases like tokenization, parsing and interpretation before the final result is obtained. Fast Lightweight Expression Evaluator (Flee) is another expression evaluator from CodePlex. It uses a custom compiler, strongly-typed expression language and lightweight codegen to compile expressions directly to IL.

Runtime Compilation

Instead of building a custom parser, another alternative is using the .NET compilers. They can parse expressions in all the .NET languages and generate IL code. The best thing about compiled code is that execution of the code is quite fast but on the other hand it takes a significant time to compile the code.

The .NET framework includes a mechanism called the Code Document Object Model (CodeDOM) for this requirement. The System.CodeDom.Compiler namespace defines types for generating source code from CodeDOM graphs and managing the compilation of source code in supported languages. The .NET framework includes code generators and code compilers for C#, JScript and Visual Basic. Here is a tutorial on how CodeDom can be used for compiling C# code.

CodeDOM uses the compiler to compile the code, so it expects the compiler to exist on the clients system. The problem with CodeDom is that a new assembly is created and loaded into the AppDomain; and since we cannot unload an assembly in .NET, we end up with several unnecessary assemblies in memory. There are two solutions for this

  • Another temporary AppDomain can be created where the assemblies are loaded and the AppDomain can be disposed at a later time
  • Using DynamicMethod to create a method at runtime. The IL code for this can be obtained from the CodeDOM. Here is a tutorial for this

The Eval Function

Eval is a function which evaluates a string as though it was an expression and returns a result. Most of the languages that support eval are scripting languages like JavaScript, Python, etc. This is because most of the scripting languages are interpreted, not compiled. This provides these languages enormous capability to evaluate dynamic expressions since an eval call is nothing but another line to interpret at runtime. Compiled languages like Java and the .NET languages (like C# and VB.NET) don't have this flexibility since they have to generate IL/Byte code before they run. However some of the .NET language do support eval even though they are compiled.

JScript.NET is Microsoft's implementation of ECMAScript in the .NET initiative. JScript.NET contains an eval function which can evaluate dynamic expression. It compiles the code every time eval called and then runs it, so there definitely is a performance hit. The downside of eval has been explained in this article.

IronPython and IronRuby are the .NET implementations of the Python and Ruby scripting languages. These are dynamic scripting languages unlike C# and VB.NET which run on the Dynamic Language Runtime. With the DLR, several dynamic constructs like eval could be implemented and since it is based on .NET, it is interoperable with C# and VB.NET. Here is a tutorial on embedding IronPython in C#.

Microsoft Roslyn

Microsoft Roslyn is Microsoft's biggest change in compilers since the advent of .NET. However this is currently in CTP and isn't available in .NET 4.0. Roslyn provides the compiler as a service (library). This is better than CodeDOM since it won't create new processes. Roslyn has wonderful features especially for Dynamic Compilations, Code Reviewers, etc. Here is a tutorial on using Roslyn with C#.

The performance benchmarks of these approaches are as follows. The code used to obtain these benchmarks is available here.