August 2008 - Posts

As a reference for some planned and unplanned future posts, I wanted to share out my “cheat sheet” for the C# 3.0 translation carried out for query expressions. Obviously it’s based on the C# 3.0 Language Specification more specially section 7.15.2. A few remarks that deserve more attention when reading the specification:

  • Words in bold are contextual keywords; these are not reserved keywords as the language doesn’t introduce new reserved keywords to ensure forward compatibility for programs written in previous language versions (e.g. “from” could be used as a variable name).
  • Method names indicated in red are the query operators known by the compiler as compilation targets for the corresponding query syntax contextual keywords. Where these methods come from is purely a question answered by method resolution rules, obviously including extension methods from C# 3.0 on.
  • The ‘*’ in the translations below stands for the transparent identifier, more of an implementation aspect of the compiler to “carry forward” the range variables captured in the anonymous type produced by a query operation (e.g. in SelectMany). This illustrates that anonymous types (as well as lambda BLOCKED EXPRESSION are essential pieces of glue to make LINQ work (although alternatives based on a hypothetical “tuple type” runtime feature would work too).
  • For the orderby operator, corresponding flavors with descending order have been omitted. Essentially all of the ordering clauses can optionally have the descending keyword, resulting in an OrderByDescending or ThenByDescending call in the corresponding translation.
  • All the rules are mutually recursive: starting with a query expression, the translation will gradually “compile away” certain parts of the query expression syntax into method call driven equivalent expressions. Notice all rules without an ellipsis … on the left-hand side don’t have any query expression keywords left on the right-hand side – these are the terminal “closed” expressions that are entirely expressible in C# 3.0 \ { LINQ }. It’s like peeling off the LINQ layer from the C# 3.0 onion.
  • The first two rules for Select seem redundant, i.e. when substituting v for x in the second rule one gets the first rule. The reason they are spelled out each individually is because the first rule covers degenerate expressions (covered in another post of mine) and rules should be evaluated from top to bottom. The Select ( x => x ) is kept based on the conditions spelled out in section 7.15.2.3.

So here’s my cheat sheet:

image

Feel free to redistribute it in any form but I’d appreciate at least a reference to the official C# 3.0 Language Specification with it as that’s the ultimate resource for the detailed language specifics. A pointer to my blog would be much appreciated as well of course :-).

Del.icio.us | Digg It | Technorati | Blinklist | Furl | reddit | DotNetKicks

Last time we talked about dealing with dynamic code generation using Reflection.Emit to generate calls to our dynamic binder. I’ve shown how to see the IL that gets generated by means of the Dynamic IL Visualizer add-in in Visual Studio. In this post I want to point out that you can actually live without that visualizer and go the hard-core way using SOS, aka “Son of Strike”, the WinDbg extension for managed code debugging that ships with the .NET Framework (sos.dll). A little-known fact is that SOS can be loaded directly in Visual Studio through the Immediate Window.

Note: This post is screenshot-driven; to conserve bandwidth I’ve included only thumbnails in the post itself; “click to enlarge” is the message.

Note: There are many ways to get to the information you care about; in the end we’re dealing with object graphs. This post is more an exploration of the data that you can get at if you want to do so; the various paths to get there are more a matter of style (in physical terms: the potential energy is constant, but the accumulated kinetic energy corresponding to the developer’s finger muscle contractions when dealing with the keyboard may vary).

 

Step 1 – Loading SOS

We’re on a breakpoint in managed code and will try to load SOS through the Immediate Window using .load sos:

image

For this to work, native code debugging needs to be enabled in the project properties:

image

 

Step 2 – Finding dynamic methods

To find the dynamically generated methods, we can search the heap for objects of type DynamicMethod using .dumpheap DynamicMethod. As you can see we don’t have any yet:

image

Using F10 we step to the next method, executing Compile causing the dynamic method for the expression to be compiled. Rerunning the command now yields:

image

We’re in business. As you can see we had two matching types: DynamicMethod and DynamicMethod+RTDynamicMethod. We’re interested in the former one, so we follow the MT field back to the list of objects. MT stands for Method Table which is a runtime structure describing the object layout for a particular type. It can be used to locate fields and methods in preparation to calls. If you want, you can use !dumpmt –MD <MT address> to dump such a table; this will show all of the “methods in the table”.

 

Step 3 – Dumping the IL

Now that we have the DynamicMethod object, we can dump the IL for it using !dumpIL <object address>:

image

As we were compiling the identity function I, the IL shown should just be a ldarg.0 and sure it is.

 

Step 4 – Another sample

We could have stopped with the previous sample but our dynamic dispatch is interesting to analyze too. Breaking after our Substring sample compilation (in a fresh run, so that the old DynamicMethod is no longer displayed) results in:

image

We can follow it through to the IL again:

image

Notice the call to “something in parentheses”. This is the runtime token representing the method, which we can dump using !do <address>. In addition, !dumpIL hints at dumping the token array:

image

The result of doing so is an array of tokens used in the IL stream. I’ll show this in a second but you might wonder what those fancy numbers in the IL stream like “70000002” stand for. Those indicate token types as defined in CorTokenType (see corhdr.h, for instance in the SSCLI “Rotor” distribution). To find the token type, mask with 0xFF000000, so in our sample that leaves 0x70000000 and this is defined as mdtString. The remainder (mask with 0x00FFFFFF) leaves us with the RID (relative ID) of 0x00000002, so entry two in the table contains the corresponding token value:

image

And yes, we see “Substring” appearing. Similarly value 2000003 in IL_0010 gives us a type def token (for System.Object) at offset 3 and value 0a000004 in IL_0031 gives us a member ref token (for our DynamicBinder.Call method) at offset 4. For the latter one, the debugger output gives you the indirection in between parentheses straight away:

image

How convenient!

 

Step 5 - From token to MethodInfo to DeclaringType

Next we can follow the object chains all the way up to the declaring type: first we get from the token to the method it represents and second we can traverse m_declaringType to get the corresponding type:

image

This produces a System.RuntimeType instance containing a Method Table reference:

image

So we can dump the method table with !dumpmt –MD <MT address> as follows:

image

As you can see we’re in our DynamicBinder class.

 

Step 6 – What about the signature?

Instead of following the path to the m_declaringType above, we could follow the m_signature as well:

image

System.Signature wraps a m_signature field of type SignatureStruct which we can dump as well, but as this is a value type we can’t use !do but rather need to use !dumpvc (standing for value class) passing in the method table (otherwise the debugger wouldn’t know how to interpret the struct which is simply a blob of data) and the address.

 

Step 7 – And back to the method being called

From the signature struct we can reach out to the method represented by the signature through m_pMethod. This is a runtime handle, wrapping an IntPtr pointer to where the loaded IL code lives. In other words, we can invoke dumpIL on that address to see the contents of what turns out to be our DynamicBinder.Call method:

image

(Notice the closure classes in the IL generated because of our C# 3.0 lambda expressions embedded in the LINQ query we use for method overload resolution.)

 

Step 8 – Dynamic modules

Dynamic methods live in dynamic modules, so it would be interesting to see those. SOS comes to the rescue with !dumpdomain.

image

Scroll down a bit and we get to see the dynamic modules. With some poking around it turns out that the last one is the one we’re looking for as it contains a multicast delegate, precisely what we've been generating. We can visualize it using !dumpmodule –mt <address> also showing the types that are referenced inside the module (that’s where we see the MulticastDelegate).

image

Also notice the attribute saying “Reflection”. To see how the type definition (TypeDef) is indirected through a method table map (MethodTableMap), follow the arrows below. Green indicates the indexing and again the masked first two bytes of the green value indicate the CorTokenType, in this case a TypeDef. You can do the same exercise for the TypeRef (0x01000000) – follow the memory pointer and get the first (zero-based count obviously) element referred in there (it should read 24 e5 0f 79 as you can infer from the output produced by the debugger, considering little endian conversion).

image

Unfortunately the defined type is marked as unloaded so we’re at a dead end – feel free to explore the Method Table for the type a bit further. In a similar way you can track down the loaded module with our DynamicBinder to do some explorations (also starting from !dumpdomain):

image

Dumping the binder’s method table gives us:

image

Who still needs ILDASM or Reflector? :-)

 

Step 9 – Walking the stack

To finish this post, I’d like to show how to dump the stack objects using !dso to get to some of this information through yet an alternative route:

image

In green I’ve indicated the result of the dynamic call and what it corresponds to. However, the items in red indicate our (mysterious?) unloaded type once more. As you’ll see though, the IL is still available:

image

The result is predictable by now I guess.

Del.icio.us | Digg It | Technorati | Blinklist | Furl | reddit | DotNetKicks

Last time in this series we were able to compile a stunningly complex “dynamic lambda” x => x – also known as I in the world of combinators – into IL code at runtime. As that’s not particularly useful, we want to move on to slightly more complex expressions like:

var o = DynamicExpression.Parameter("o");
var a = DynamicExpression.Parameter("a");
var b = DynamicExpression.Parameter("b");
var call = DynamicExpression.Call(o, "Substring", a, b);
var func = DynamicExpression.Lambda(call, o, a, b);
Console.WriteLine(func);
Console.WriteLine(func.Compile().DynamicInvoke("Bart", 1, 2));

Or, in pretty print,

(o, a, b) => o.Substring(a, b)

 

Setting the scene

We explained how the translation of a dynamic expression tree takes place in general: as we traverse the tree, individual nodes are visited asking them to append code capturing the expression’s semantics to an IL stream, pushing a value on the stack that corresponds to the evaluated expression. The method that does this translation for every dynamic expression is called “Compile”:

/// <summary>
/// Appends IL instructions to calculate the expression's runtime value, putting it on top of the evaluation stack.
/// </summary>
/// <param name="ilgen">IL generator to append to.</param>
/// <param name="ldArgs">Lambda argument mappings.</param>
protected internal abstract void Compile(ILGenerator ilgen, Dictionary<ParameterDynamicExpression, int> ldArgs);

In here we’re using a simplified concept of “lambda parameters in scope” using ldArgs, avoiding getting into slightly more complex techniques such as hoisting that are required for more involved expression trees. Previously you saw how to implement this method for ParameterDynamicExpression and LambdaDynamicExpression, respectively:

protected internal override void Compile(ILGenerator ilgen, Dictionary<ParameterDynamicExpression, int> ldArgs)
{
    if (!ldArgs.ContainsKey(this))
        throw new InvalidOperationException("Parameter expression " + Name + " is not in scope.");

    ilgen.Emit(OpCodes.Ldarg, ldArgs[this]);
}

and

protected internal override void Compile(ILGenerator ilgen, Dictionary<ParameterDynamicExpression, int> ldArgs)
{
    Body.Compile(ilgen, ldArgs);
    ilgen.Emit(OpCodes.Ret);
}

 

Dynamic method calls and binders

For MethodCallExpression things are a bit more involved than for the expression types above. Before we start, remember the most important portion of the Compile method contract: leave one value on top of the stack that corresponds to the evaluated expression, in this case a method call. What does a method call consist of? Here are the ingredients:

public ReadOnlyCollection<DynamicExpression> Arguments { get; private set; }
public DynamicExpression Object { get; private set; }
public string Method { get; private set; }

As we’re seeing other DynamicExpression objects being referenced in here, it’s already clear we’ll have to evaluate those by recursively calling Compile. So, we could do something along the lines of:

compile Object
foreach argument in Arguments
    compile argument
call Method

That’s the typical structure of a call site, pushing arguments on the stack including the object to invoke the method on. From a stack point of view: n + 1 arguments are pushed, where n is the number of arguments and 1 accounts for the instance to invoke the method one, and next all of those stack citizens are eaten by the method call, producing the single return value on top of the stack. This follows the contract of our Compile method.

There’s a slight problem though: we can’t emit the call because we don’t know what type to invoke it one. The reason is well-known in the meantime: the Object nor the Arguments have strongly-typed information, so just given a string named “Method”, we can’t get the required method metadata to emit a call(virt) instruction. Bummer. But that’s the whole point of dynamic programming, delaying the decision about the executed method/function till runtime because the type might dynamically grow with new members (think of ETS in PowerShell as a sample of such a capability).

One way to solve this problem is to emit a bunch of reflection code to investigate the type of Object at runtime, do the same for all of the arguments, try to find a suitable method to call, etc etc. I shouldn’t explain how complicated this would become :-). There are lots of drawbacks to this: we’re baking in the whole dynamic call infrastructure into the call site and as we’re emitting all of that code, the odds to adapt it without having to recompile the code are off. This whole “locate a suitable method” algorithm could be made extensible too if we’re not emitting it into the generated code straight away. In other words, we want to get out of the IL generating business as soon as we can, and introduce a level of indirection. That particular kind of indirection is what we call a binder.

So what’s a binder precisely? It’s simply a class that contains all the functionality to make well-formed decisions (based on certain desired semantics) about method calls (amongst other invocation mechanisms). Actually we have such a thing in the framework already: System.Reflection.Binder. As the documentation says:

Selects a member from a list of candidates, and performs type conversion from actual argument type to formal argument type.

The list of candidates is something that can be made extensible, allowing methods to be “imported” or “attached” to existing types at runtime. The type conversion clause in the sentence above outlines that the binder is responsible to take the actual passed in arguments (in our case weakly typed) and turn them into (i.e. casting) formal argument types that are suitable for consumption by the selected candidate method. The sample on MSDN for System.Reflection.Binder shows what it takes to implement such a beast. We’re not going to do that though, just to simplify matters a bit. As we’re only interested in method calls, we’ll just implement the bare minimum binder to get the job done and explained. Furthermore, we won’t spend time on implicit conversions for built-in types (like int to long) as the mentioned sample illustrates that already. Last but not least, generics are not brought in the equation either.

Without further delay, let’s show a possible binder implementation:

public class DynamicBinder
{
    public static object Call(object @this, string methodName, params object[] args)
    {
        //
        // Here we're going to be lazy for demo purposes only. Our overload resolution
        // will pick the first applicable method without applying "betterness" rules
        // as outlined in the C# specification (v2.0, section $7.4.2). We don't care
        // about extension methods either (how could the namespace be brought in scope
        // in the context of an expression tree...?) nor other dynamic type extensions
        // such as IExpando (~ IMarshalEx) or e.g. PowerShell ETS.
        //

        var result = (from method in @this.GetType().GetMethods()
                      where method.Name == methodName
                      let parameters = method.GetParameters()
                      where parameters.Length == args.Length
                            && parameters.Where((p, i) => p.ParameterType.IsAssignableFrom(args[i].GetType())).Count() == args.Length
                      select new { Method = method, Parameters = parameters }).SingleOrDefault();

        if (result == null)
        {
            StringBuilder sb = new StringBuilder();
            sb.Append("Failed to bind method call: ");
            sb.Append(@this.GetType());
            sb.Append(".");
            sb.Append(methodName);
            sb.Append("(");

            int n = args.Length;
            for (int i = 0; i < n; i++)
                sb.Append(args[i].GetType().ToString() + (i != n - 1 ? ", " : ""));

            sb.Append(").");
            throw new InvalidOperationException(sb.ToString());
        }

        return result.Method.Invoke(@this, args);
    }
}

This needs some explanation I assume. The signature should be straightforward: given an object @this, we want to call method methodName with zero or more arguments args. The result of this will be an object again (notice we don’t support void return types for methods being called, which isn’t too big of deal when considering functions as lambdas – i.e. no statement lambdas). What’s more interesting though is the way we find a suitable method. I chose to write it as a gigantic LINQ expression just to show how powerful LINQ can be. Let me walk you through it:

var result = (from method in @this.GetType().GetMethods()
              where method.Name == methodName
              let parameters = method.GetParameters()

For all methods available on the left-hand side of the call (i.e. @this) select those methods that have the same name (case sensitive compare – this would be a binder that mimics C# name resolution for method calls) and let parameters be a variable containing the parameters for each of the selected methods going forward. In other words, in what follows we’re seeing a sequence of (method, parameters) pairs mapping each suitable (at least concerning the name) method on the parameters it takes. Next we need to do overload resolution:

              where parameters.Length == args.Length

Here we make sure the number of arguments on the candidate method matches the number of arguments passed in to the binder’s Call call. This implies we don’t consider things like optional arguments supported by some languages which would mean that having less matching parameters (but not more!) would keep the method as a candidate, although there would need to be some ordering to make sure that methods with more arguments take precedence over methods with arguments supplied through optional values. Notice this simple check makes it also impossible to call a “params” method without stiffing the argument in an array upfront.

                    && parameters.Where((p, i) => p.ParameterType.IsAssignableFrom(args[i].GetType())).Count() == args.Length

Now we’re in the clause that’s maybe the most interesting. Here we’re taking all the parameters of the candidate and check that the parameter p on position i has a type that’s assignable from the type of the argument passed in to the binder’s Call method. In essence this is contravariance for arguments. Assume we’re examining a candidate method like this:

class ExperimentalZoo
{
    Animal CloneBeast(Mammal g);
}

and we’re calling the binder as follows:

DynamicBinder.Call(new ExperimentalZoo(), “CloneBeast”, new Giraffe())

As we’re calling the binder with an argument of type Giraffe (args[0].GetType()) and Giraffe inherits from Mammal (parameters[i].ParameterType), the candidate is compatible. However, if we’d call the method with an argument of type Goldfish it would clearly not be compatible (as a fish is not a mammal). This is precisely what the Where clause above enforces. The Count() == args.Length trick at the end makes sure all of the arguments pass the test (using the All operator would be ideal but it hasn’t an overload passing in the index; alternatively a Zip operator would be beneficial too).

Finally we have the select clause:

              select new { Method = method, Parameters = parameters }).SingleOrDefault();

which simply extracts the method (of type MethodInfo) and the parameters (of type ParameterInfo[]) and makes sure we only found one match. This is another simplification for illustrative purposes only – to be fully compliant with e.g. the C# language, we’d have to implement all of the overload resolution rules including “betterness rules” that select the most optimal overload. More information on this can be found in the C# specification, in v3.0 under “7.4.3 Overload Resolution”. The key takeaway though is that we can tweak this binder as much as we want (e.g., left as an exercise, we could implement resolution that takes extension methods into account) without affecting the generated IL code that will simply call into the binder’s Call method.

If we find one result, we can just go ahead and call it by calling through the retrieved Method using the Invoke method, passing in the @this pointer and the args array.

 

Connecting the pieces

Now that we have our beloved binder, we need to glue it together with our dynamic expression compilation. In concrete terms this means we need to emit a call to DynamicBinder.Call in the generated IL for the DynamicCallExpression. This isn’t too hard either:

protected internal override void Compile(ILGenerator ilgen, Dictionary<ParameterDynamicExpression, int> ldArgs)
{
    if (Object == null)
        ilgen.Emit(OpCodes.Ldnull);
    else
        Object.Compile(ilgen, ldArgs);

    ilgen.Emit(OpCodes.Ldstr, Method);

    ilgen.Emit(OpCodes.Ldc_I4, Arguments.Count);
    ilgen.Emit(OpCodes.Newarr, typeof(object));

    LocalBuilder arr = ilgen.DeclareLocal(typeof(object[]));
    ilgen.Emit(OpCodes.Stloc, arr);

    int i = 0;
    foreach (DynamicExpression arg in Arguments)
    {
        ilgen.Emit(OpCodes.Ldloc, arr);
        ilgen.Emit(OpCodes.Ldc_I4, i++);
        arg.Compile(ilgen, ldArgs);
        ilgen.Emit(OpCodes.Stelem_Ref);
    }

    ilgen.Emit(OpCodes.Ldloc, arr);

    ilgen.EmitCall(OpCodes.Call, typeof(DynamicBinder).GetMethod("Call"), null);
}

What’s going on here? First, we check whether an Object has been specified. This is more of an extensibility point in case our binder would like to implement global functions (also left as an exercise, for example you could recognize null.Add(1, 2) as a global Add call, translating into Math.Add(…); or, the Method property could be set to “Math.Add” to denote a static method call). We’ll assume the else case holds true for our samples, causing us to call Compile recursively on the Object dynamic expression. This will add the value corresponding to the Object expression tree’s evaluation on top of the stack (note: you can smell call-by-value semantics already, don’t you?). Next, we load the string specified in the Method property onto the stack as well. Currently the stack looks like:

(string) Method
(object) Object.Compile result

Now we get into interesting stuff as our binder’s Call method expects to see an object[] as its third parameter. How many arguments? On for each of the DynamicExpression objects in the Arguments collection, so we do Newarr passing in the object type object after pushing the number of elements to be allocated on the stack using Ldc_I4 passing in Arguments.Count. Now we have our array, we can store it in a local variable we call “arr”. Time to fill the array by first loading the local, then pushing the index followed by a push of the argument’s value – again obtained by a recursive Compile call on the argument “arg” – and finally calling stelem_ref (as we’re dealing with System.Object we need _ref). The loop invariant is that it doesn’t change the stack height: it cleanly loads three “arguments” to stelem_ref which brings the stack delta back to 0).

Ultimately, we load the array local variable and the stack looks like (semantically):

(object[]) Arguments.Select(arg => arg.Compile()).ToArray()
(string) Method
(object) Object.Compile()

ready for a call to DynamicBinder.Call which turns the stack into:

(object) DynamicBinder.Call(Object.Compile(), Method, Arguments.Select(arg => arg.Compile()).ToArray())

Again, we have managed to keep the house clean with regards to the stack behavior, i.e. the element on top of the stack contains the value corresponding to the entire (MethodCall)DynamicExpression.

 

Testing it

Does it work? Let’s try with our running sample:

var o = DynamicExpression.Parameter("o");
var a = DynamicExpression.Parameter("a");
var b = DynamicExpression.Parameter("b");
var call = DynamicExpression.Call(o, "Substring", a, b);
var func = DynamicExpression.Lambda(call, o, a, b);
Console.WriteLine(func);
Console.WriteLine(func.Compile().DynamicInvoke("Bart", 1, 2));

Recognize the patterns in the output IL?

image

A quick walk-through:

  • IL_0000 loads MethodCallDynamicExpression.Object which in turn was compiled into a ldarg V_0 by the ParameterDynamicExpression’s Compile method (this corresponds to “o”)
  • IL_0006 loads MethodCallDynamicExpression.Method
  • IL_000b to IL_0015 prepares the array for the method call arguments to be passed to the binder
  • IL_0016 to IL_0022 puts the first argument (corresponding to “a” translated into ldarg V_1 through ParameterDynamicExpression.Compile) in the array
  • IL_0023 to IL_002f does the same for the second argument (corresponding to “b” translated into ldarg V_2 through ParameterDynamicExpression.Compile)
  • IL_0030 to IL_0036 finally makes the call through the binder, passing in the results of the above and returning the value produced by the binder

If we now set a breakpoint in the DynamicBinder.Call method and let execution continue, we’ll see:

image

The third line in the Call Stack is where DynamicInvoke is happening:

Console.WriteLine(func.Compile().DynamicInvoke("Bart", 1, 2));

and through the “External Code” corresponding to our emitted dynamic method we got back into the DynamicBinder that now will pick the right Substring method given lhs “Bart” and arguments 1 and 2. Ultimately the following prints to the screen:

image

Magic. To show it’s really extensible we can start to compose things endlessly with our two main ingredients: parameter and method call expressions. Here’s a sample (reverse engineering the nested DynamicExpression factory calls is left as an exercise):

image

Also left as an exercise to the reader is to find values for o and a through h that produce the displayed output above :-). For the record, here’s the corresponding IL:

IL_0000: ldarg      V_0
IL_0004: nop       
IL_0005: nop       
IL_0006: ldstr      "Replace"
IL_000b: ldc.i4     2
IL_0010: newarr     Object
IL_0015: stloc.0   
IL_0016: ldloc.0   
IL_0017: ldc.i4     0
IL_001c: ldarg      V_3
IL_0020: nop       
IL_0021: nop       
IL_0022: stelem.ref
IL_0023: ldloc.0   
IL_0024: ldc.i4     1
IL_0029: ldarg      V_4
IL_002d: nop       
IL_002e: nop       
IL_002f: stelem.ref
IL_0030: ldloc.0   
IL_0031: call       System.Object Call(System.Object, System.String, System.Object[])/BinderFun.DynamicBinder
IL_0036: ldstr      "Substring"
IL_003b: ldc.i4     2
IL_0040: newarr     Object
IL_0045: stloc.1   
IL_0046: ldloc.1   
IL_0047: ldc.i4     0
IL_004c: ldarg      V_1
IL_0050: nop       
IL_0051: nop       
IL_0052: stelem.ref
IL_0053: ldloc.1   
IL_0054: ldc.i4     1
IL_0059: ldarg      V_2
IL_005d: nop       
IL_005e: nop       
IL_005f: stelem.ref
IL_0060: ldloc.1   
IL_0061: call       System.Object Call(System.Object, System.String, System.Object[])/BinderFun.DynamicBinder
IL_0066: ldstr      "Replace"
IL_006b: ldc.i4     2
IL_0070: newarr     Object
IL_0075: stloc.2   
IL_0076: ldloc.2   
IL_0077: ldc.i4     0
IL_007c: ldarg      V_5
IL_0080: nop       
IL_0081: nop       
IL_0082: stelem.ref
IL_0083: ldloc.2   
IL_0084: ldc.i4     1
IL_0089: ldarg      V_6
IL_008d: nop       
IL_008e: nop       
IL_008f: stelem.ref
IL_0090: ldloc.2   
IL_0091: call       System.Object Call(System.Object, System.String, System.Object[])/BinderFun.DynamicBinder
IL_0096: ldstr      "PadRight"
IL_009b: ldc.i4     2
IL_00a0: newarr     Object
IL_00a5: stloc.3   
IL_00a6: ldloc.3   
IL_00a7: ldc.i4     0
IL_00ac: ldarg      V_7
IL_00b0: nop       
IL_00b1: nop       
IL_00b2: stelem.ref
IL_00b3: ldloc.3   
IL_00b4: ldc.i4     1
IL_00b9: ldarg      V_8
IL_00bd: nop       
IL_00be: nop       
IL_00bf: stelem.ref
IL_00c0: ldloc.3   
IL_00c1: call       System.Object Call(System.Object, System.String, System.Object[])/BinderFun.DynamicBinder
IL_00c6: ldstr      "ToUpper"
IL_00cb: ldc.i4     0
IL_00d0: newarr     Object
IL_00d5: stloc.s    V_4
IL_00d7: ldloc.s    V_4
IL_00d9: call       System.Object Call(System.Object, System.String, System.Object[])/BinderFun.DynamicBinder
IL_00de: ret       

Enjoy! Next time … who knows what?

Del.icio.us | Digg It | Technorati | Blinklist | Furl | reddit | DotNetKicks

Welcome back to the dynamic expression tree fun. Last time we designed our simplified expression tree class library we’ll be using to enable dynamic treatment of objects. Today, we’ll take this one step further by emitting IL code that resolves the operations invoked on such dynamic objects at runtime through a mechanism called binders. Before we dive in, let me point out that everything discussed in this series is greatly simplified just to illustrate the core ideas and base mechanisms/principles that make dynamic language stuff work.

 

Introducing IL generation

Dynamic code compilation is a wonderful thing. It’s not that hard once you get the basics right (and have some level of IL opcode understanding) but quite hard to debug. Luckily we have tools like Haibo Luo’s IL Visualizer. Since I’ll be using this, download it, extract the ZIP file, compile the whole solution and copy ILMonitor\bin\Debug\*.dll to %programfiles%\Microsoft Visual Studio 9.0\Common7\Packages\Debugger\Visualizers. Alternatively you can put it in your personal Visual Studio 2008\Visualizers folder.

So, what’s our task? Assume we have the following piece of sample code:

class Program
{
    static void Main(string[] args)
    {
        var o = DynamicExpression.Parameter("o");
        var a = DynamicExpression.Parameter("a");
        var b = DynamicExpression.Parameter("b");
        var call = DynamicExpression.Call(o, "Substring", a, b);
        var func = DynamicExpression.Lambda(call, o, a, b);
        Console.WriteLine(func);
        Console.WriteLine(func.Compile().DynamicInvoke("Bart", 1, 2));
    }
}

We already know how to construct the objects and to represent it as a string (which would be (o, a, b) => o.Substring(a, b)). Now we need to focus on the marked Compile method on LambdaDynamicExpression. Starting with the signature of the delegate, we want to create (at runtime) a method that takes in three “dynamic” parameters (corresponding to parameter expressions o, a and b), returning a resulting object. Since we don’t have any type information available, everything should be System.Object, so we’d end up with the following delegate:

delegate object TheDynamicLambdaFunction(object o, object a, object b);

Looking at Compile as a black box, it will return an instance of this delegate pointing at an on-the-fly generated method corresponding to the lambda’s body expression. Returning a System.Delegate, one can call DynamicInvoke (or cast it to a compatible delegate) to invoke it with the given parameters. Obviously we want the call to do “the right thing”, in the sample above it would correspond to a method call to System.String::Substring on “Bart”, passing in startIndex 1 and length 2, producing another string containing “ar”.

It should be clear that we need to emit IL on the fly to translate the lambda expression but also the lambda’s body which could be anything, not just a MethodCallDynamicExpression. Since we lack other expression node types, one such (trivial) thing would be:

var x = DynamicExpression.Parameter("x");
var I = DynamicExpression.Lambda(x, x);
Console.WriteLine(I);
Console.WriteLine(I.Compile().DynamicInvoke("Bart"));

which is just the identity function (the underlined bold ‘o’ above indicates the lamdba’s body). I intentionally named the expression above “I” conform SKI combinators where I is defined as λx . x. Similarly we could define the K combinator as:

var x = DynamicExpression.Parameter("x");
var y = DynamicExpression.Parameter("y");
var K = DynamicExpression.Lambda(x, x, y);
Console.WriteLine(K);
Console.WriteLine(K.Compile().DynamicInvoke("Bart", 123));

but we got sidetracked, so time to move on. The whole point here is that we can’t assume the body of the lambda to be a MethodCallDynamicExpression. So, how do we tackle this? An important observation one can make is this: an expression represents a single value. Right, so what? Wait a minute, is IL-code not stack-based? Adding the two things together we could think of the following solution:

Calling a Compile method on an expression tree object, given a writeable stream for IL instructions, should add all the instructions to the stream required to evaluate the expression, leaving the result of the evaluation on top of the stack.

A LambdaDynamicExpression is the only dynamic expression that supports a publicly visible Compile method. It’s pseudo-code would look like:

  1. Create an IL stream; here the IL stack is empty.
  2. Take the Body expression and compile it by emitting IL instructions for it; this causes the IL stack to be one high.
  3. Add an IL return instruction to return the object on top of the stack.
  4. Return a delegate pointing to the method represented by the generated IL code.

 

Supporting expression compilation

To make this work, we’ll first extend the base class by adding one more method:

/// <summary>
/// Class representing a dynamic expression tree.
/// </summary>
abstract class DynamicExpression
{
    /// <summary>
    /// Appends IL instructions to calculate the expression's runtime value, putting it on top of the evaluation stack.
    /// </summary>
    /// <param name="ilgen">IL generator to append to.</param>
    /// <param name="ldArgs">Lambda argument mappings.</param>
    protected internal abstract void Compile(ILGenerator ilgen, Dictionary<ParameterDynamicExpression, int> ldArgs);

}

This method will take in two things: the IL generator (referred to as “IL stream” in the previous paragraph) and a mapping table for the lambda’s parameter expressions. Why do we need the latter, or better: what does it map the parameter expressions to? Assume we’re compiling

(o, a, b) => o.Substring(a, b)

While traversing the expression tree, asking every node in the correct order to emit IL instructions, we’ll encounter references to the parameters again. Our goal is to write a dynamic method looking like this:

object GeneratedDynamicMethod(object o, object a, object b)
{
    return o.Substring(a, b);
}

where the . obviously denotes a dynamic method call in this case. As we encounter parameter expressions like o, a or b during the translation for the method body, we need to know how to load those parameters from the argument list on the dynamic method. First of all, notice the lambda parameters got mapped in order of specification to correspond to arguments on the generated dynamic method, i.e. o as the first lambda parameter and is the first parameter on the generated method. And so on. This is precisely what the ldArgs argument on Compile stands for: a mapping from the parameter expression representation from the lambda parameters onto the concrete indices for the arguments:

o –> 0
a –> 1
b –> 2

Whenever we encounter such a parameter expression during the compilation, we know the position of the argument, so we can emit a ldarg instruction. This is the most trivial Compile override:

sealed class ParameterDynamicExpression : DynamicExpression
{
    protected internal override void Compile(ILGenerator ilgen, Dictionary<ParameterDynamicExpression, int> ldArgs)
    {
        if (!ldArgs.ContainsKey(this))
            throw new InvalidOperationException("Parameter expression " + Name + " is not in scope.");

        ilgen.Emit(OpCodes.Ldarg, ldArgs[this]);
    }
    …
}

This simply says, whenever code needs to be emitted for a ParameterDynamicExpression, simply try to find it in the dictionary to map it onto the formal parameter index on the dynamic method being emitted and turn it into a ldarg instruction for that argument index. Real full-fledged expression tree implementations would be slightly more complicated because arguments could be hidden when dealing with nested lambdas (quoting, invocation expressions, etc) but that would take us too far away from home.

For the LambdaDynamicExpression, besides a public Compile method, there will also be an override to the inherited one. It simply asks the Body expression to emit itself (which will result in a one-level high stack containing the evaluation result of the body expression), followed by a ret instruction (simply returning the value evaluated through the Body’s IL code preceding it):

sealed class LambdaDynamicExpression : DynamicExpression
{
    protected internal override void Compile(ILGenerator ilgen, Dictionary<ParameterDynamicExpression, int> ldArgs)
    {
        Body.Compile(ilgen, ldArgs);
        ilgen.Emit(OpCodes.Ret);
    }

}

 

Emitting code

For this post, we’ll omit an implementation for MethodCallDynamicExpression as that will be part of the next post focusing on binders. All we want to get to work today is the I combinator or identity function (yeah, another world-beater :-)):

var x = DynamicExpression.Parameter("x");
var I = DynamicExpression.Lambda(x, x);
Console.WriteLine(I);
Console.WriteLine(I.Compile().DynamicInvoke("Bart"));

In other words, today we’ll focus on the plumbing of emitting the code and wrapping the method in a delegate that can be returned upon calling LambdaDynamicExpression.Compile. The result for the sample above would be equivalent to:

public Delegate Compile()
{
    return new Func<object, object>(delegate(object x) { return x; });
}

The underlined portion is the code corresponding to I’s compilation. In IL-terms it would be as simplistic as this:

ldarg.0
ret

This emitted IL method body then needs to get the signature that says “taking in an object, returning an object”. All of this makes up the dynamic method. But we’re not done yet, as we need to return a delegate to it. In the free translation above, I’ve leveraged the generic System.Func<T1,R> delegate but we only have a limited number of those (up to four arguments), so what if we encounter a method that takes more arguments? Indeed, we’ll need to generate our own delegate types as well. Notice we could cache those very efficiently: the ones with arity (~ number of parameters) up to 4 could simply be mapped onto System.Func delegates with System.Object type parameters, while others would be generated on the fly and kept for reuse if another method with same arity gets compiled. We’ll omit this optimization for now.

Here’s how the code to create our own delegate type looks like:

private static Type GetDynamicDelegate(Type[] argumentTypes, Type returnType)
{
    //
    // Assemblies contain modules; generate those with unique names.
    // The generated assembly is runtime only (doesn't need to be saved to disk).
    //
    AssemblyBuilder assemblyBuilder = AppDomain.CurrentDomain.DefineDynamicAssembly(new AssemblyName(Guid.NewGuid().ToString()), AssemblyBuilderAccess.Run);
    ModuleBuilder moduleBuilder = assemblyBuilder.DefineDynamicModule(Guid.NewGuid().ToString());

    //
    // Our delegate is a private sealed type deriving from MultiCastDelegate.
    //
    TypeBuilder typeBuilder = moduleBuilder.DefineType("Lambdas", TypeAttributes.NotPublic | TypeAttributes.Sealed | TypeAttributes.AutoLayout | TypeAttributes.AnsiClass, typeof(MulticastDelegate));

    //
    // The delegate's constructor is a "special name" method with signature (object native int).
    // It doesn't have a method body by itself; rather, it's supplied by the managed runtime.
    //
    ConstructorBuilder ctorBuilder = typeBuilder.DefineConstructor(MethodAttributes.Public | MethodAttributes.HideBySig | MethodAttributes.SpecialName | MethodAttributes.RTSpecialName, CallingConventions.Standard, new Type[] { typeof(object), typeof(IntPtr) });
    ctorBuilder.SetImplementationFlags(MethodImplAttributes.Runtime | MethodImplAttributes.Managed);

    //
    // We only need the Invoke method (BeginInvoke and EndInvoke are irrelevant for us).
    // It doesn't have a method body by itself; rather, it's supplied by the managed runtime.
    // Here our delegate signature is enforced.
    //
    MethodBuilder invokeMethodBuilder = typeBuilder.DefineMethod("Invoke", MethodAttributes.Public | MethodAttributes.NewSlot | MethodAttributes.HideBySig | MethodAttributes.Virtual, CallingConventions.HasThis, returnType, argumentTypes);
    invokeMethodBuilder.SetImplementationFlags(MethodImplAttributes.Runtime | MethodImplAttributes.Managed);

    //
    // Return the created delegate type.
    // Notice we could cache this for reuse by other dynamic methods.
    //
    return typeBuilder.CreateType();
}

Lots of attribute flags which you can read all about in the CLI specification. I don’t pretend to memorize all of those attributes; why would I if ILDASM makes life just great? :-)

image

This is the screenshot of ILDASM showing a delegate for a method with signature object(object, object) as you can see on the Invoke method. We don’t need any of the asynchronous pattern implementation, so we just need a constructor and Invoke method (see section IIA.13.6 on “Delegates” in the CLI standard). One special thing about those is they don’t have an IL code body as they are “runtime managed” (see IIA.14.4.3 on “Implementation Attributes of Methods” in the CLI standard):

image

Now that we can generate the delegate, we just need to ask the lambda (since that’s the root) expression tree to emit its IL code, which will traverse the entire tree. In order to be able to do this, we need to keep mapping information about the lambda parameters mapped onto the formal arguments as mentioned earlier. Here’s the result:

public Delegate Compile()
{
    //
    // Map the lambda parameters onto formal argument indices.
    // Also build up the argument type array.
    //
    var args = new Type[Parameters.Count];
    var ldArgs = new Dictionary<ParameterDynamicExpression, int>();
    for (int i = 0; i < args.Length; i++)
    {
        args[i] = typeof(object);
        ldArgs[Parameters[i]] = i;
    }

    //
    // Compile the expression tree to an IL method body.
    //
    var method = new DynamicMethod("", typeof(object), args);
    var ilgen = method.GetILGenerator();
    Compile(ilgen, ldArgs);

    //
    // Get the delegate matching the dynamic method signature.
    //
    Type dynamicDelegate = GetDynamicDelegate(args, typeof(object));

    //
    // Return a delegate pointing at our dynamic method.
    //
    return method.CreateDelegate(dynamicDelegate);
}

This code should be relatively straightforward. First we create the mapping while building up an array just containing typeof(object)’s (since all arguments are objects in our dynamic world). Next we create a dynamic method with the right signature, produce the IL generator and let the expression compilation do all of the work to emit the IL. And finally we stick the whole thing in a dynamically created delegate that matches the signature, returning that to the caller. Setting a breakpoint on the last line and executing for the “I” identity combinator shows this:

image

This is the IL visualizer we installed earlier. Notice the friendly string representation for the dynamic method shows the signature, which matches the one of the dynamic lambda in the watch window (which is just “I”). Bringing up the IL visualizer shows stunningly complex code:

image

Ignore the NOPs inserted by the IL generator, but IL_0000 was emitted by ParameterDynamicExpression.Compile through the compilation of the lambda body. IL_0006 was emitted subsequently by LambdaDynamicExpression.Compile and the stack is nicely in balance. Sure enough, the result printed is:

image

Woohoo – truly dynamic (though simplistic) execution! Next time: method call expressions and binders.

Del.icio.us | Digg It | Technorati | Blinklist | Furl | reddit | DotNetKicks

In the previous post, I outlined the use of the expression trees from the System.Linq.Expressions namespace. Let’s recap to set the scene:

Expression<Func<string, int, int, string>> data = (string s, int a, int b) => s.Substring(a, b);

produces (deep breadth)

ParameterExpression s = Expression.Parameter(typeof(string), “s”);
ParameterExpression a = Expression.Parameter(typeof(int), “a”);
ParameterExpression b = Expression.Parameter(typeof(int), “b”);

Expression<Func<string, int, int, string>> data = Expression.Lambda<Func<string, int, int, string>>(
    Expression.Call(s, typeof(string).GetMethod(“Substring”, new Type[] { typeof(int), typeof(int) }), a, b),
    s, a, b
);

Func&l