Codegen vs. Reflection!

Laurence Gonsalves prefers compile-time code generation over runtime reflection. He writes,
I think all-in-one toolchains (eg: IDEs) are actually part of the reason Java has both not enough magic (like Rob says) and also the wrong kind of magic (runtime reflection). If there was a standard way to plug code generators into Java (or better yet: a Common Lisp-like macro system) then the vast majority of places where people use runtime-magic (reflection) today could be replaced by compile-time magic, and your static analysis tools (ie: compile-time type checking, code navigation tools, refactoring tools, etc.) would actually still work.
Java has a standard way to generate code: the annotation processing tool. Laurence acknowledges APT, but claims it it's useless for his purposes:
[...] I have to say that annotation processors are like lobotomized macros. I did some experiments wit them a few years ago (when Java 5 first came out), and it was pretty difficult to do any useful metaprogramming with them without doing really nasty things.
Among others, one core claim Laurence makes is that refactoring still works if there's a standard way to generate code. I disagree.

Code Generation

Suppose I'm using code generation to do JSON-Java object mapping. One hypothetical code generator takes a JSON text:
{
    "name": "Jesse Wilson",
    "age": 28,
    "favorite taco flavors": [ "fish", "steak" ]
}
...and generates a Java class:
/* GENERATED FILE! DO NOT EDIT */

public class TacoCustomer {
    private final String name;
    private final double age;
    private final ImmutableList<String> favoriteTacoFlavors;
    
    public String getName() {
        return this.name;
    }
    public double getAge() {
        return age;
    }
    public ImmutableList<String> getFavoriteTacoFlavors() {
        return favoriteTacoFlavors;
    }
    public static TacoCustomer parseJson(String jsonText) {
        ...
    }
}
This looks fine enough, almost like code a human would write. But when I exercise my refactoring tool to rename getFavoriteTacoFlavors to getFavoriteTacos, I face one of two ugly situations:
  • It appears to work. But later when I regenerate the code, my edits are clobbered and my application no longer builds. Yuck.
  • It doesn't work, because the generator made the file non-writable file. Now I need to figure out the code generator, use its name mapping to rename accessors, regenerate, and manually fix the callers.
Code generation is bad because you cannot edit your code directly. You need to edit something upstream.

Runtime Reflection

To contrast, here's how the equivalent runtime-reflection solution might look. We manually write an interface that describes our JSON schema, plus some pretty annotations to define the mapping:
public interface TacoCustomer {
    @JsonProperty("name");
    String getName();

    @JsonProperty("age");
    double getAge();

    @JsonProperty("favorite taco flavors");
    ImmutableList<String> getFavoriteTacoFlavors();
}
Exercise the runtime-generated code with a shared, type-safe entry point. It might be invoked like this:
        String jsonText = ...
        TacoCustomer jesse = new JsonToJava().parse(TacoCustomer.class, jsonText);
Refactoring this interface just works, and I can even attach meaningful Javadoc to its methods. The interface can be an inner-interface of another class. And it can host a manually-coded inner classes.

Runtime reflection is good because it lets you put your code wherever you want it, and edit it directly.

10 comments:

Laurence said...

"When runtime reflection is used to implement a feature, refactoring just works." Sorry, but this is demonstrably wrong. Yes, there are certain ways you can use reflection that don't get screwed up by specific kinds of refactorings, but it only "just works" if you're very careful about the type of reflection you use and the type of refactorings you perform against that code. Dynamically conforming to an interface and then performing refactorings that only affect that interface is one example of this.

That's far from the only way people use reflection, however, and most of the other things that people do with reflection consistently break refactoring. For example, performing actions based on whether methods with certain names or name patterns exist in an object. JUnit does this (instantiating and executing TestCases once per method named "test*"), as does Java's Serialization API (readObject, writeObject, etc.). Many Webwork injectors also do this. In these sorts of cases if you decide to perform a "rename method" refactoring the method will become disconnected from its callers, and you'll only find out things have gone wrong at runtime.

Even in your specific example, the worst case you get with code generation (ie: when you don't have proper refactoring support for your code generator and your refactoring affects the generated code) is a build failure, which I'd argue is much better than the behavior you get when refactoring affects your reflection: broken runtime behavior.

Your example does illustrate another problem with reflection that I hadn't mentioned, though: with reflection you can't write code that's portable across languages. Suppose part of your software is written in some other language, like C++ or Python, and you need to send TacoCostomer JSON object between these components. The reflection approach makes the TacoCustomer definition non-portable to other languages. With code generation you can generate code for whatever language you want. [One nit with your code-generation example: you'd typically use some sort of IDL (think .proto files) as the source for your code generator, not an example JSON input.]

In fact, if you didn't care about cross-language portability, and were happy with the interface constraints this places upon you, you could use compile-time annotations that would look to the user exactly like your runtime annotations.

The reason I complained about APT is that it doesn't work for cases much more complicated than your example. I once tried to reduce some of the boilerplate imposed by the Visitor Pattern. This would have been easy with a stand-alone code generator, but I couldn't find an acceptable way to do it with APT. Even for your much simpler example APT is a little awkward (exactly like the reflection version): If you were hand writing this you probably wouldn't design an API that involved passing around TacoCustomer.class.

Laurence said...

I forgot to mention why APT (ie: annotation based code generation) would be better for your example than reflection. Your post seems to imply that I was just complaining about reflection breaking refactoring, but I was actually complaining about it breaking a wide variety of things that rely on static analysis, refactoring tools just being an example.

Another sort of thing that reflection tends to break is compile-time error detection. Your example illustrates this nicely. With reflection, if you forget to annotate one of your methods or have a method whose signature makes no sense (eg: returns void, takes extra parameters, etc.) you won't learn of the problem until you try to actually use one of these things at runtime. With APT you'd get a build failure.

jessewilson said...

Laurence, We agree that much bad magic has been implemented with both reflection and code generation.

But whenever I need a tasteful bit of magic in new code, I always prefer runtime reflection over compile-time code generation.

Your suggested benefit of using code generation to target multiple language targets is lame. Protocol buffers do this and it leads to a cumbersome API in every target language. It's even worse for DOM and Corba IDL.

Laurence said...

"But whenever I need a tasteful bit of magic in new code, I always prefer runtime reflection over compile-time code generation."

Why? You give no reasons, and as I pointed out, even your example is strictly better when code generation via APT is used rather than runtime reflection because you get compile-time checking.

"Your suggested benefit of using code generation to target multiple language targets is lame. Protocol buffers do this and it leads to a cumbersome API in every target language."

Saying it's lame simply because you disagree is lame.

You're confusing the old protocol buffer API, which I agree was pretty terrible, with what is actually possible. The old protocol buffer API was just a straight port of the C++ API and was written by someone who apparently had never used Java Collections. Its API didn't need to be so cumbersome, and the newer Java API is in fact a significant improvement. With code generation the API can be whatever you want it to be.

With reflection (and also APT -- this is my complaint with it) you actually have a lot less flexibility in your API meaning that you often have no choice but to produce something very awkward. eg: requiring users to name methods to match a pattern, shoehorn parameters into annotations, or pass Class objects around (the last isn't always bad, but it's certainly a bad smell).

swankjesse said...

Laurence, I don't think I'm going to be able to convince you. I think we fundamentally disagree on what a good API looks like.

For all of its faults, JUnit's reflection-based "test" convention is convenient. TestNG's reflection-based annotation approach is even better. I have yet to see a code generation approach that improves upon this.

There are plenty of well-exercised APIs that demonstrate the pure awesomeness of runtime reflection: hibernate, JAX-RS, and TestNG.

Can you remind me of the APIs that show off the strengths of compile-time code generation?

Laurence said...

"I think we fundamentally disagree on what a good API looks like."

So is there something specific about the protocol buffer API that you find cumbersome?

"For all of its faults, JUnit's reflection-based "test" convention is convenient. TestNG's reflection-based annotation approach is even better."

Yes, JUnit's reflection-based convention is convenient, until you accidentally mistype a test method name and don't realize that your test are green because some of your test methods aren't even being run. That caveat aside, I do think reflection is acceptable for development tools (including test frameworks) because runtime failures in such tools are practically build failures anyway. My only real complaint with using reflection in test frameworks is that it makes a lot of Java developers assume that it's okay to use it anywhere.

I've never used hibernate or JAX-RS so I can't really say anything about them, though if they sacrifice compile-time checks I doubt that I'd be happy using them.

"Can you remind me of the APIs that show off the strengths of compile-time code generation?"

The whole reason this topic even came up was that Java does not have adequate support for code generation, so asking me to provide examples of Java APIs that show off the strengths of compile-time code generation is a bit like if this were the 1700s and you asked me for an example of a horseless carriage that's better than a "horsed" carriage.

That said, can you show me an example of a LL(k) parsing framework (like ANTLR or JavaCC) that doesn't use code-generation?

You might also want to look into what's possible with macros in Common Lisp. If Java had such macros it would be easy to make a testing framework like JUnit or TestNG without using reflection. My guess is that equivalents to the other libraries you mentioned would also be possible (though, again, I've never used them) and you wouldn't have to sacrifice compile-time checks or other static analysis tools.

swankjesse said...

I agree, ANTLR demonstrates code generation can be wildly successful.

How would you improve code generation? It seems like Eclipse is trying to make the tooling story easier. Have they failed?

btilford said...

Why not just get the best of both worlds and write an annotation pre-processor? They were added in JDK 6 and are what Hibernate and CXF both use to generate code.

Laurence said...

I wasn't saying that the IDEs are the sole source of problems with code generation in Java. Many of the problems predate modern IDEs like Eclipse and IDEA and are problems with Java itself.

An easy problem to fix would be to add something equivalent to C's #line directive. Right now, if you generate Java code, compilation errors (from your Java compiler) and runtime errors all refer to the generated Java code. This is rarely what you want. You generally want these errors to instead refer to the hand-written original source.

JSR-45 was supposed to be a solution to this problem, but it was over-designed and never properly implemented. (In particular, I think there still isn't a way to give an SMAP file to javac so that the compilation errors are correct.)

A more complicated problem is that Java compilation units can have cyclic dependencies. This means you have to give a bundle of source files to the Java compiler at once, and this can complicate more advanced uses of code generation. Most Java code generators have to effectively be blind to what's going on in Java code. You run the code generator first, and then you run the Java compiler on all of the code after. This is really just a work-around, though. It would be better if you could somehow integrate into the Java compilation process so that some compilation units could be in Java, and others in other languages. This would let the compilation units written in other languages actually examine the Java type system so they could make smarter decisions (particularly when reporting errors or warnings).

Eclipse may be improving the refactoring support for other languages, which is great, but the issue I was referring to when I mentioned IDEs is more fundamental than that. The IDEs complicate things for code generators because it isn't good enough to just have the best support possible for javac, as a large fraction of Java developers now use Eclipse or IDEA for compiling their code (at least part of the time).

So even if javac added some sort of API so that code generators could plug into it, the IDEs would either lag behind (like they did with annotation processors), or they'd have a different, incompatible API (like the incompatible plugin APIs in IDEA and Eclipse).

Also, in the absence of such an API (ie: the current state of the world), the IDEs cause difficulty for code generators because of the practically jingoist "I don't want anything else in my toolchain but my IDE" mentality that seems to be very common among IDE users. The problem is that the IDEs make building happen "by magic", but only if 100% of the code is Java. The fix for this is to either to make the magic work virtually 100% of the time or to eliminate the magic entirely.

Laurence said...

Finally, everything I've said so far has only been about build issues, but the problems with IDEs are actually at every level. The IDEs fragment the Java development platform such that any tool that's going to operate at compile time or any DSL will need support for the most popular IDEs as well as plain old javac. This wouldn't be terrible, except that the IDEs make it really hard to add support for new build tools and languages. The way language editors work in Eclipse is so poorly factored that it's laughable, especially considering that one of Eclipses strengths is supposed to be its refactoring tools. (eg: the fact that JavaScript support was added by copying all of the code for the Java support and then modifying it. Does that sound like good software engineering?)

If the IDE creators care about making this better they need to make this as simple as possible. It should not take thousands of lines of code in Java with hundreds of lines of XML configuration files just to make an editor in Eclipse that properly syntax highlights and indents code in a new language. (for a point of reference: adding support for a new language to vim usually entails only a few hundred lines of configuration, often much less) Ideally, it shouldn't be necessary for tool creators to learn a separate API for Eclipse, IDEA, and javac, either.