Java doesn't support regular expression literals like some other languages. Still, you can get half way.
Regular expression literals
Here's a trivial regular expression example in Java that replaces forward slashes with backslashes:
public class Slashes { public static void main(String[] args) { String windowsPath = "foo\\bar\\baz"; String expression = "\\\\"; String unixPath = windowsPath.replaceAll(expression, "/"); System.out.append(windowsPath).append(" -> ").println(unixPath); } }
In order to replace \
we need to match using \\\\
because backslash
is the escape character in both Java strings and regular expressions.
Where a language supports regex literals the expressions become less verbose and syntax errors can be detected by the parser. Here is the same operation in JavaScript:
var windowsPath = "foo\\bar\\baz"; var expression = /\\/; var unixPath = windowsPath.replace(expression, "/");
Static inspection tools can detect syntax errors in the parser - see this malformed expression:
/\/ SyntaxError: Invalid regular expression: missing /
In the Java version the parser is blind to any syntax errors in the regular expression grammar because the compiler treats it as just a string. This problem can be overcome.
Validating string literals with annotations
From Java 6 onwards, annotation processing is supported by javac as part of the compile process.
Here's an example class showing the use of a custom RegexSyntax
annotation:
import blog.iae.regex.annotation.RegexSyntax; class Foo { /** OK */ @RegexSyntax final String matchLatinCapitals = "[A-Z]+"; /** Not legal regular expression */ @RegexSyntax final String fail = "++"; /** But it is if escaped */ final String escaped = "\\+\\+"; /** Or as a literal */ @RegexSyntax(flags = java.util.regex.Pattern.LITERAL) final String literal = "++"; String winToUnix(String path) { @RegexSyntax final String winSlash = "\\\\"; return path.replaceAll(winSlash, "/"); } }
All are legal Java syntax, but one is not a valid regular expression. Compilation fails:
>javac -cp regex-annotation-0.0.2.jar Foo.java Foo.java:10: error: blog.iae.regex.annotation.RegexSyntax: Dangling meta character '+' near index 0 final String fail = "++"; ^ ++ ^ 1 error
The literals are validated at compile time by a
Processor
implementation:
// snippet from a Processor implementation /** * Processes elements annotated with {@link RegexSyntax}. */ @Override public boolean process(Set<? extends TypeElement> annotations, RoundEnvironment roundEnv) { for (Element target : roundEnv.getElementsAnnotatedWith(RegexSyntax.class)) { if (isVariable(target) && isFinal(target) && isStringConstant(target)) validateExpression(target); } return true; } /** * Emits an error if the string is not legal regular expression syntax. * * @see ProcessingEnvironment * @see Pattern */ private void validateExpression(Element element) { VariableElement variable = (VariableElement) element; String pattern = variable.getConstantValue().toString(); try { int flags = element.getAnnotation(RegexSyntax.class).flags(); Pattern.compile(pattern, flags); } catch (PatternSyntaxException e) { String err = RegexSyntax.class.getName() + ": " + e.getLocalizedMessage(); env.getMessager().printMessage(Kind.ERROR, err, element); } }
Since you can't annotate a literal, it must be referenced via an annotated variable.
The literal must be a constant assigned to a variable that is declared final
.
See Getting Started with the Annotation Processing Tool, apt for a short annotation processor development guide.
IDEs and other compilers
Not all tool chains will run annotation processors automatically, but most can be configured to.
Refer to your tool documentation for specifics.
Sample code
All the sources are available in a public Subversion repository.
Repository: http://illegalargumentexception.googlecode.com/svn/trunk/code/java/
License: MIT
Project: regex-annotation
You can download the prebuilt binary regex-annotaion-0.0.2.zip
from the
Downloads page.
No comments:
Post a Comment
All comments are moderated