I spend quite a lot of time writing Java code. I got to thinking about the time I spent implementing, testing, maintaining and just paging over equals /hashCode implementations. These common building blocks tend to work much the same way in most classes and I wondered if there were a way to make them go away.
Implementing equals
, hashCode
and toString
The DataStructure
class below shows a class that can be tested for equality
and emits a structured string form for debugging purposes.
public class DataStructure {
|
There are a number of common ways to implement these methods. None lead to Programming Nirvana.
- Developer manually implements the code (see above). This is repetitive and potentially error-prone work. It runs contrary to the DRY principle.
- The Apache Commons classes like ToStringBuilder take some of the work out of implementing common methods. Developers still need to write and maintain the code and ensure that all appropriate members are accounted for. The API has some introspection options, but these can run afoul of the Java security system.
- Editor/IDE-generated code. Tools like Eclipse can generate equals/hashCode and toString methods. Maintainers need to ensure they maintain or regenerate the code as appropriate to reflect class changes.
Most importantly, existing solutions don't give me an excuse to play around with bytecode manipulation frameworks.
Goals
- Less code, not more. Classes that use the API should be smaller and have less source code to maintain.
- Less error prone. Using the API should result in fewer opportunities to make mistakes.
- Intuitive and simple. The API should be easy to comprehend and appear logical to someone who hasn't seen it before.
- Compact API. The API should be small, have a minimal number of options and tackle a finite set of problems.
Automating Method Generation
The code that follows is purely proof of concept. I'd advise against using it in production.
Since annotations are
de rigueur in Java at the moment, I decided to try and use them to implement the methods in a standard
form during or after compilation.
Since all the methods are declared in the java.lang.Object
parent, no pre-processing needs to be done
to let other classes compile against the new source form.
Here is the new version of the above class.
import boilerplate.equals.AutoEquals;
|
The proof of concept implementation uses the ASM API to manipulate the classes in a post-compilation step.
Here is how the code is compiled and processed:
X:\>javac -cp boilerplate.jar DataStructure2.java X:\>java -jar bpc.jar . INFORMATION: boilerplate.equals.AutoEquals DataStructure2: Adding equals(Object) and hashCode() methods. INFORMATION: boilerplate.tostring.AutoToString DataStructure2: Adding toString() method.
The addition of the methods can be confirmed with the Sun JDK
javap
tool. [Use the -c
switch to view the generated instructions.]
X:\>javap -classpath . DataStructure2 Compiled from "DataStructure2.java" public class DataStructure2 extends java.lang.Object{ public java.lang.Object key; public java.lang.Object value; public DataStructure2(); public boolean equals(java.lang.Object); public int hashCode(); public java.lang.String toString(); }
A finer degree of control can be exercised by annotating the fields instead of the types.
In the code below, only the key
field is emitted by toString
.
@AutoEquals
|
Issues and Limitations
There are no silver bullets. There are a number of consequences to adding methods to your classes after compilation. The list is probably not comprehensive.
1: Debugging
This approach adds invisible functionality.
You can't read the code and you can't step through it with a debugger
(unless the tool allows you to step through bytecode).
This may make determining the cause of some defects difficult.
For example, if an Object referenced itself, it would be possible to
introduce a recursive call that resulted in a StackOverflowError
.
Without code, diagnosing the cause would become more difficult.
Perhaps the API should also be able to generate the source-code equivalent of the bytecode for use by tooling. It is difficult to imagine this approach being popular with tooling developers.
[Cursory testing with Eclipse 3.3 showed no adverse effects when it came to debugging other methods in processed classes. Crude support for this API can be added to Eclipse via an Ant script builder via the project properties.]
2: Concurrency
This technique makes no allowance for thread safety. Mutable objects being accessed simultaneously from multiple threads are not good candidates for these techniques. Synchronised blocks probably aren't something that can be reliably generated by inspection of the bytecode. Perhaps a "GuardedBy" mechanism could be used (see Java Concurrency in Practice by Brian Goetz, with Peierls et al.]).
3: Unsafe
This may be an abuse of the annotations system. The functionality of the classes is different before and after the processing of the annotations, yet unprocessed classes are valid and may appear to be correct. Contrast this with dependency injection annotations where problems are quickly apparent when running code outside the framework.
This could be fixed by putting the support into the compiler. Alas, there does not seem to be a compiler-agnostic way of doing this (though there are compiler-specific hacks). The annotation processor provides automatic service discovery in the compiler, but it is an error to modify existing artifacts. The other alternative is to process the classes in a custom ClassLoader or using the instrumentation API, but these require special application support or JRE configuration. This is fine in a framework like an EJB container but unsuitable for a general purpose enhancement.
An alternative may be to require that native
method signatures be added for
replacement by the processor class:
@AutoEquals public native boolean equals(Object o);
|
An annotation processor could validate this requirement at compile time. Classes would fail on encountering an unimplemented native method. This would be a hack and lead to misleading error messages, but classes would fail instead of continuing with unintended functionality.
4: Less Control
As might be expected, a degree of fine-grain control is surrendered.
For example, control over the order in which members are emitted by toString
.
Also affected would be common implementation decisions like how type equality should be handled:
@Override public boolean equals(Object obj) {
|
...versus...
@Override public boolean equals(Object obj) {
|
[See Josh Bloch on Design: A Conversation with Effective Java Author, Josh Bloch [by Bill Venners] for more.]
It would be easy to surprise programmers expecting their individual One True Way of implementing a method. Good documentation and limited option handling via annotation elements seem to be the only solution.
5: Can't Annotate Generated Methods
You can't add additional annotations to the generated methods. No doubt someone will want to.
A native
method signature might help here too.
6: Documentation and Source Tooling
In the javadoc, it appears as if the classes do not override equals/hashCode/toString. This is misleading. This problem may apply to other source-level tools too; adding special cases for these methods might be needed all over the place.
To a degree, a native
method signature might help here too.
7: Not Extensible
The approach does not permit any extension of the API because it relies on the method
signatures being present in java.lang.Object
.
It would not be possible to apply it to other commonly implemented methods, such as
Comparable.compareTo.
Again, a native
method signature would help here.
8: Inheritance
The general approach is not amenable to dealing with inherited member fields.
The difficulty of implementing equals
and hashCode
in a class hierarchy are
well known (see Effective Java Second Edition by Josh Bloch).
This API cannot protect against those dangers.
It is possible to introspect parents, but it is doubtful that this would yield any useful information.
[I considered restricting the @AutoEquals
annotation to final
classes that directly extended Object
.
Although it provided a degree of safety, this seemed too restrictive in an object oriented language.]
The toString
implementation is immature.
It does not emit the type or handle the parent type.
These issues can be addressed. In particular, a super
call can be used to deal with
the parent type. Additional annotation elements could offer some degree of control here.
9: Recursive References
The API offers no protection against recursive references. Further precautions/options are needed to tackle problems like this:
public class RecursionBug {
|
Conclusions
Obviously, something like this involves trade-offs. Do the time savings and improved reliability from automation outweigh the limitations and potential problems? It may be that this approach is too limited, too specific and Java programmers would be better served with a generalised solution.
Source Code
Source code is available from a Subversion repository.
Repository: http://illegalargumentexception.googlecode.com/svn/trunk/code/java/
License: MIT
Projects:
boilerplate (runtime dependencies);
boilerplateBuild (Ant build scripts);
boilerplateProcessor (class processor);
boilerplateTest (unit tests)
cool idea, it might not be usefull as much as we would like, but ill try to use it in my test code(mostly toString).
ReplyDeleteMight've known - it's been done before: Dennis Sosnoski's developerWorks article on using annotations and ASM to generate toString.
ReplyDelete