Day 12 – Towards cleaner JVM-Interoperability

As some of our readers might remember, interoperability on Rakudo JVM has been described as “basic” by jnthn++ last year. Which is to say, calling methods on Java objects inside Rakudo on the JVM was possible, but it wasn’t always pretty or convenient. As a reminder:

    use java::util::zip::CRC32:from<java>;
    my $crc = CRC32.new();
    for 'Hello, Java'.encode('UTF-8').list {
        $crc.'method/update/(I)V'($_);
    }

In this post I want to describe my journey towards figuring out why the method call looks like that and what we can do to improve it.

What’s wrong with that method call?

The reason behind the unusual method call is that there’s no multi-method dispatch (MMD) for overloaded methods on Java objects from the Perl 6 side. Sure enough, checking the javadoc for java.util.zip.CRC32 reveals that update() is overloaded with three candidates. Readers familiar with the JVM might notice that the method we call on $crc is a method descriptor, defined in the JVM specification as

A method descriptor represents the parameters 
that the method takes and the value that it returns.
-- https://docs.oracle.com/javase/specs/jvms/se7/html/jvms-4.html#jvms-4.3.3

In our case this means “a method called update which takes a primitive int and returns void[1]. Clearly Rakudo should figure out on its own which candidate we want, considering Rakudo has MMD on Perl 6 objects and Java has compile-time dispatch on it’s own objects as well. Let’s see what it takes!

The nitty-gritty, if only some of it

Let’s start at the top of the code example. We’re starting with a use statement which by and large does the same in Perl 6 as it does in Perl 5, except we are supplying a longname consisting of the module name and the colonpair :from<java>. The colonpair with key from tells the ModuleLoader to not use the default Perl 6 module loader, but a different one, in our case the JavaModuleLoader.

In JavaModuleLoader::load_module we’re starting to tread towards vm-level code. After making sure we have an interop loader, we call out to Java code with typeForNameFromJar() or typeForName() respectively. This is where we’re leaving NQP code and entering Java code on our trail. Next stop: org.perl6.nqp.runtime.BootJavaInterop

typeForName() and typeForNameFromJar() both do some amount of path-munging to find the .class or .jar file, build a ClassLoader with the path where they found those files and pass the loaded class to getSTableForClass. A STable, or shared table, represents a pairing of a HOW and a REPR, that is, a pairing between a metaobject and the vm-level representation of an object of that type. Creation of a STable happens lazily, via a ClassValue, where the InteropInfo remembers the Java class it represents as well as the computed interop and STable. The important thing we’re looking for is set inside computeInterop, where the documentation rightfully states the gory details begin. The details in question concern themselves with bytecode generation via the framework org.objectweb.asm, although most of the aforementioned details are not particularly important at this stage. What is important though is the following bit:

    HashMap<String, SixModelObject> names = new HashMap< >();
    // ...
    for (int i = 0; i < adaptor.descriptors.size(); i++) {
        String desc = adaptor.descriptors.get(i);
        SixModelObject cr = adaptorUnit.lookupCodeRef(i);
        int s1 = desc.indexOf('/');
        int s2 = desc.indexOf('/', s1+1);
        // create shortname
        String shorten = desc.substring(s1+1, s2);
        // XXX throw shortname => coderef away if shortname is overloaded
        names.put(shorten, names.containsKey(shorten) ? null : cr);
        // but keep the descriptor
        names.put(desc, cr);
    }
    // ...
    freshType.st.MethodCache = names;

The adaptor object is a dataholder for a multitude of things we need while generating the adaptor, for example the name of the adaptor class or a list of the constants it has to contain, or even the CompilationUnit we generated in bytecode. The adaptorUnit is a shorthand for the mentioned CompilationUnit. What happens here is that we construct the MethodCache (which is a HashMap) by extracting the shortname of the method out of the descriptor and adding the shortname as key with the CodeRef as value, as long as we only have one candidate. To be sure we aren’t forgetting anything, we also add the descriptor as key to the MethodCache with the same CodeRef. Thus we have figured out why the method call in the original example has to look the way it does: we don’t even know the method by its shortname.

Great. How do we fix it?

Dynamically. Which is a bit complicated on the JVM, because Java is statically typed. Luckily JSR 292 [2] has been approved some time ago which means we have an instruction available that “supports efficient and flexible execution of method invocations in the absence of static type information”. The basic mechanics of this are as follows:

    • While generating bytecode for the runtime of our language we insert an invokedynamic instruction in a place where we want to be able to decide at runtime which method we want to invoke.
    • This instruction references a slightly special method (usually called a bootstrap method) with a specific signature. Most importantly this method has to return a java.lang.invoke.CallSite.
    • When the invokedynamic instruction is encountered while executing the bytecode, the bootstrap method is called.
    • Afterwards the resulting CallSite is installed in place of the invokedynamic instruction, and on repeated calls the method installed with the CallSite will be invoked instead of the bootstrap method.

To be able to catch methods that we want to treat as multi on the Perl 6 side, we have to change how the adaptors are generated. Recall that we currently only know which methods are overloaded after we generated the adaptor, thus we’re too late to insert an invokedynamic as adaptor method. So we override BootJavaInterop.createAdaptor, and instead of walking through all methods and simply creating an adaptor method directly, we additionally memorize which methods would end up having the same shortname and generate an invokedynamic instruction for those as well.

There’s one more problem though. The fact that we have a shortname that should dispatch to different methods depending on the arguments means that we can’t take full advantage of installing a CallSite. This is because any given CallSite always dispatches to exactly one method, and method signatures in Java are statically typed. Luckily we can instead resolve to a CallSite which installs a fallback method, which does the actual resolution of methods. [3]

To summarize this briefly: Via invokedynamic we install a CallSite that dispatches to a fallback method which converts the Perl 6 arguments to Java objects and then looks among all methods with the same shortname for one that fits the argument types. I won’t paste the org.objectweb.asm instructions for generating byte code here, but the fallback method looks approximately as follow

Object fallback(Object intc, Object incf, Object incsd, Object... args) throws Throwable {
    // some bookkeeping omitted
    Object[] argVals = new Object[args.length];
    for(int i = 0; i < args.length; ++i) {
        // unboxing of Perl 6 types to Java objects happens here
    }

    int handlePos = -1;
    OUTER: for( int i = 0; i < hlist.length; ++i ) {
        Type[] argTypes = Type.getArgumentTypes(((MethodHandle)hlist[i]).type().toMethodDescriptorString());
        if(argTypes.length != argVals.length) {
            // Different amount of parameters never matches
            continue OUTER;
        }
        INNER: for( int j = 1; j < argTypes.length; ++j ) {
            if( !argTypes[j].equals(Type.getType(argVals[j].getClass())) ) {
                switch (argTypes[j].getSort()) {
                    // comparison of types of the unboxed objects with 
                    // the types of the method parameters happens here
                } 
                // if we didn't match the current signature type we can skip this method
                continue OUTER;
            }
        }
        handlePos = i;
        break;
    }
    if( handlePos == -1 ) {
        // die here, we didn't find a matching method
    }

    // create a MethodHandle with a boxed return type
    MethodType objRetType = ((MethodHandle)hlist[handlePos]).type().changeReturnType(Object.class);
    // and convince our adaptor method to return that type instead
    MethodHandle resHandle = ((MethodHandle) hlist[handlePos]).asType(objRetType);

    MethodHandle rfh;

    try {
        // here's where we look for the method to box the return values
    } catch (NoSuchMethodException|IllegalAccessException nsme) {
        throw ExceptionHandling.dieInternal(tc,
            "Couldn't find the method for filtering return values from Java.");
    }

    MethodHandle rethandle = MethodHandles.filterReturnValue(resHandle, (MethodHandle) rfh);
    return ((MethodHandle)rethandle).invokeWithArguments(argVals);
}

The curious may check the whole file here to see the omitted parts (which includes heaps of debug output), although you’d also have to build NQP from this branch for the ability to load .class files as well as a few fixes needed for overriding some of the methods of classes contained in BootJavaInterop if you wanted to compile it.

The result of these changes can be demonstrated with the following classfile Bar.java (which has to be compiled with javac Bar.java)

public class Bar {

    public String whatsMyDesc(int in, String str) {
        String out = "called 'method/answer/(ILjava/lang/String;)Ljava/lang/String;";
        out += "\nparameters are " + in + " and " + str;
        return out;
    }

    public String whatsMyDesc(String str, int in) {
        String out = "called 'method/answer/(Ljava/lang/String;I)Ljava/lang/String";
        out += "\nparameters are " + str + " and " + in;
        return out;
    }
}

and this oneliner:

$ perl6-j -e'use Bar:from<java>;
my $b = Bar.new; 
say $b.whatsMyDesc(1, "hi"); 
say $b.whatsMyDesc("bye", 2)'
called 'method/answer/(ILjava/lang/String;)Ljava/lang/String;
parameters are 1 and hi
called 'method/answer/(Ljava/lang/String;I)Ljava/lang/String;
parameters are bye and 2

Closing thoughts

While working on this I discovered a few old-looking bugs with marshalling reference type return values. This means that the current state can successfully dispatch among shortname-methods, but only value types and java.lang.String can be returned without causing problems, either while marshalling to Perl 6 objects or when calling Perl 6 methods on them. Additionally, there’s a few cases where we can’t sensibly decide which candidate to dispatch to, e.g. when two methods only differ in the byte size of a primitive type. For example one of

    public String foo(int in) {
        // ...
    }
    public String foo(short in) {
        // ...
    }

is called with a Perl 6 type that does not supply byte size as argument, let’s say Int. This is currently resolved by silently choosing the first (that is, declared first) candidate and not mentioning anything about this, but should eventually warn about ambiguity and give the method descriptors for all matching candidates, to facilitate an informed choice by the user. Another, probably the biggest problem that’s not quite resolved is invocation of overloaded constructors. Constructors in Java are a bit more than just static methods and handling of them doesn’t quite work properly yet, although it’s next on my list.

These shortcoming obviously need to be fixed, which means there’s still work to be done, but the thing I set out to do – improving how method calls to Java objects look on the surface in Perl 6 code – is definitely improved.

Addendum: As of today (2015-01-04) the works described in this advent post have been extended by a working mechanism for shortname constructors, the marshalling issues for reference types have been solved and the resulting code has been merged into rakudo/nom. Note that accessors for Java-side fields are not yet implemented on the Perl 6 side via shortname, so you’ll have to use quoted methods of the form “field/get_$fieldName/$descriptor”() and “field/set_$fieldName/$descriptor”($newValue) respectively, where $fieldName is the name of the field in the Java class and $descriptor the corresponding field descriptor. That’s next on my list, though.

4 thoughts on “Day 12 – Towards cleaner JVM-Interoperability

  1. The formating errors in this post are all my fault. I went to fix a simple typo and somehow broke WordPress’s support of escaped symbols in all the code blocks. I have no idea how to get it back, even restoring Peschwa’s version retains the problem I introduced.

    1. I agree that the current way of choosing a candidate by “whichever came first” is not much of sensible resolution method.

      Matching the widest type by default is likely the most sensible thing to do. As far as I know, coercion to native types is not yet implemented, but would probably be the best solution for choosing a specific method by its type. This likely still means warning whenever the type is automatically narrowed to fit the Java method.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s