Wednesday 24 December 2008

Java: automating the equals method

I spend quite a lot of time writing Java code. I got to thinking about the time I spent implementing, testing, maintaining and just paging over equals /hashCode implementations. These common building blocks tend to work much the same way in most classes and I wondered if there were a way to make them go away.


Implementing equals, hashCode and toString

The DataStructure class below shows a class that can be tested for equality and emits a structured string form for debugging purposes.

public class DataStructure {

  public Object key;
  public Object value;

  @Override
  public boolean equals(Object obj) {
    if (this == obj)
      return true;
    if (!(obj instanceof DataStructure))
      return false;
    final DataStructure other = (DataStructureobj;
    if (key == null) {
      if (other.key != null)
        return false;
    else if (!key.equals(other.key))
      return false;
    if (value == null) {
      if (other.value != null)
        return false;
    else if (!value.equals(other.value))
      return false;
    return true;
  }

  @Override
  public int hashCode() {
    final int prime = 31;
    int result = 1;
    result = prime * result + ((key == null: key.hashCode());
    result = prime * result + ((value == null: value.hashCode());
    return result;
  }

  @Override
  public String toString() {
    StringBuilder builder = new StringBuilder();
    builder.append("key=");
    builder.append(key);
    builder.append(" value=");
    builder.append(value);
    return builder.toString();
  }

}

There are a number of common ways to implement these methods. None lead to Programming Nirvana.

  • Developer manually implements the code (see above). This is repetitive and potentially error-prone work. It runs contrary to the DRY principle.
  • The Apache Commons classes like ToStringBuilder take some of the work out of implementing common methods. Developers still need to write and maintain the code and ensure that all appropriate members are accounted for. The API has some introspection options, but these can run afoul of the Java security system.
  • Editor/IDE-generated code. Tools like Eclipse can generate equals/hashCode and toString methods. Maintainers need to ensure they maintain or regenerate the code as appropriate to reflect class changes.

Most importantly, existing solutions don't give me an excuse to play around with bytecode manipulation frameworks.


Goals

  • Less code, not more. Classes that use the API should be smaller and have less source code to maintain.
  • Less error prone. Using the API should result in fewer opportunities to make mistakes.
  • Intuitive and simple. The API should be easy to comprehend and appear logical to someone who hasn't seen it before.
  • Compact API. The API should be small, have a minimal number of options and tackle a finite set of problems.

Automating Method Generation

The code that follows is purely proof of concept. I'd advise against using it in production.

Since annotations are de rigueur in Java at the moment, I decided to try and use them to implement the methods in a standard form during or after compilation. Since all the methods are declared in the java.lang.Object parent, no pre-processing needs to be done to let other classes compile against the new source form.

Here is the new version of the above class.

import boilerplate.equals.AutoEquals;
import boilerplate.tostring.AutoToString;

@AutoEquals
@AutoToString
public class DataStructure2 {

  public Object key;
  public Object value;

}

The proof of concept implementation uses the ASM API to manipulate the classes in a post-compilation step.

Here is how the code is compiled and processed:

X:\>javac -cp boilerplate.jar DataStructure2.java

X:\>java -jar bpc.jar .
INFORMATION: boilerplate.equals.AutoEquals
DataStructure2: Adding equals(Object) and hashCode() methods.
INFORMATION: boilerplate.tostring.AutoToString
DataStructure2: Adding toString() method.

The addition of the methods can be confirmed with the Sun JDK javap tool. [Use the -c switch to view the generated instructions.]

X:\>javap -classpath . DataStructure2
Compiled from "DataStructure2.java"
public class DataStructure2 extends java.lang.Object{
    public java.lang.Object key;
    public java.lang.Object value;
    public DataStructure2();
    public boolean equals(java.lang.Object);
    public int hashCode();
    public java.lang.String toString();
}

A finer degree of control can be exercised by annotating the fields instead of the types. In the code below, only the key field is emitted by toString.

@AutoEquals
public class DataStructure3 {

  @AutoToString(include=true)
  public Object key;
  @AutoToString(include=false)
  public Object value;

}

Issues and Limitations

There are no silver bullets. There are a number of consequences to adding methods to your classes after compilation. The list is probably not comprehensive.

1: Debugging

This approach adds invisible functionality. You can't read the code and you can't step through it with a debugger (unless the tool allows you to step through bytecode). This may make determining the cause of some defects difficult. For example, if an Object referenced itself, it would be possible to introduce a recursive call that resulted in a StackOverflowError. Without code, diagnosing the cause would become more difficult.

Perhaps the API should also be able to generate the source-code equivalent of the bytecode for use by tooling. It is difficult to imagine this approach being popular with tooling developers.

[Cursory testing with Eclipse 3.3 showed no adverse effects when it came to debugging other methods in processed classes. Crude support for this API can be added to Eclipse via an Ant script builder via the project properties.]

2: Concurrency

This technique makes no allowance for thread safety. Mutable objects being accessed simultaneously from multiple threads are not good candidates for these techniques. Synchronised blocks probably aren't something that can be reliably generated by inspection of the bytecode. Perhaps a "GuardedBy" mechanism could be used (see Java Concurrency in Practice by Brian Goetz, with Peierls et al.]).

3: Unsafe

This may be an abuse of the annotations system. The functionality of the classes is different before and after the processing of the annotations, yet unprocessed classes are valid and may appear to be correct. Contrast this with dependency injection annotations where problems are quickly apparent when running code outside the framework.

This could be fixed by putting the support into the compiler. Alas, there does not seem to be a compiler-agnostic way of doing this (though there are compiler-specific hacks). The annotation processor provides automatic service discovery in the compiler, but it is an error to modify existing artifacts. The other alternative is to process the classes in a custom ClassLoader or using the instrumentation API, but these require special application support or JRE configuration. This is fine in a framework like an EJB container but unsuitable for a general purpose enhancement.

An alternative may be to require that native method signatures be added for replacement by the processor class:

    @AutoEquals public native boolean equals(Object o);

An annotation processor could validate this requirement at compile time. Classes would fail on encountering an unimplemented native method. This would be a hack and lead to misleading error messages, but classes would fail instead of continuing with unintended functionality.

4: Less Control

As might be expected, a degree of fine-grain control is surrendered. For example, control over the order in which members are emitted by toString.

Also affected would be common implementation decisions like how type equality should be handled:

    @Override public boolean equals(Object obj) {
        if (this == obj)
            return true;
        if(this == null || obj.getClass() != this.getClass())
            return false;
        //etcetera

...versus...

    @Override public boolean equals(Object obj) {
        if (this == obj)
            return true;
        if (!(obj instanceof ThisClass))
            return false;
        //etcetera

[See Josh Bloch on Design: A Conversation with Effective Java Author, Josh Bloch [by Bill Venners] for more.]

It would be easy to surprise programmers expecting their individual One True Way of implementing a method. Good documentation and limited option handling via annotation elements seem to be the only solution.

5: Can't Annotate Generated Methods

You can't add additional annotations to the generated methods. No doubt someone will want to.

A native method signature might help here too.

6: Documentation and Source Tooling

In the javadoc, it appears as if the classes do not override equals/hashCode/toString. This is misleading. This problem may apply to other source-level tools too; adding special cases for these methods might be needed all over the place.

To a degree, a native method signature might help here too.

7: Not Extensible

The approach does not permit any extension of the API because it relies on the method signatures being present in java.lang.Object. It would not be possible to apply it to other commonly implemented methods, such as Comparable.compareTo.

Again, a native method signature would help here.

8: Inheritance

The general approach is not amenable to dealing with inherited member fields.

The difficulty of implementing equals and hashCode in a class hierarchy are well known (see Effective Java Second Edition by Josh Bloch). This API cannot protect against those dangers. It is possible to introspect parents, but it is doubtful that this would yield any useful information.

[I considered restricting the @AutoEquals annotation to final classes that directly extended Object. Although it provided a degree of safety, this seemed too restrictive in an object oriented language.]

The toString implementation is immature. It does not emit the type or handle the parent type. These issues can be addressed. In particular, a super call can be used to deal with the parent type. Additional annotation elements could offer some degree of control here.

9: Recursive References

The API offers no protection against recursive references. Further precautions/options are needed to tackle problems like this:

public class RecursionBug {

    public Object value;

    @Override
    public String toString() {
        return value.toString();
    }

    /** Causes StackOverflowError */
    public static void main(String[] args) {
        RecursionBug o = new RecursionBug();
        o.value = o;
        o.toString();
    }

}

Conclusions

Obviously, something like this involves trade-offs. Do the time savings and improved reliability from automation outweigh the limitations and potential problems? It may be that this approach is too limited, too specific and Java programmers would be better served with a generalised solution.


Source Code

Source code is available from a Subversion repository.

Repository: http://illegalargumentexception.googlecode.com/svn/trunk/code/java/
License: MIT
Projects: boilerplate (runtime dependencies); boilerplateBuild (Ant build scripts); boilerplateProcessor (class processor); boilerplateTest (unit tests)

2 comments:

  1. cool idea, it might not be usefull as much as we would like, but ill try to use it in my test code(mostly toString).

    ReplyDelete

All comments are moderated