The JDK 8 Developer Preview came out recently so I thought I would take a look at the API enhancements. This post demonstrates how refactoring to utilize the new types can change code.
This post refers to JDK build 1.8.0-ea-b106. IntelliJ IDEA has good Java 8 syntax support.
A HTML character reference encoder
This type encodes code points in a given set as character references. It could be used in situations where you were emitting character data in a non-Unicode encoding (not that you would want to.)
package demo.java7; import java.util.Arrays; import static java.lang.Character.codePointAt; import static java.lang.Character.isHighSurrogate; import static java.lang.Integer.toHexString; public final class CharReferences { public static final CodePointSet HTML = new Special('\"', '&', '\'', '<', '>'); public static final CodePointSet NOT_ASCII = new GreaterThan(127); public static final CodePointSet NOT_ISO8859_1 = new GreaterThan(255); private final CodePointSet[] sets; public CharReferences(CodePointSet... doEncode) { this.sets = doEncode; } public String encode(CharSequence source) { StringBuilder target = new StringBuilder(); for (int i = 0, len = source.length(); i < len; i++) { char ch = source.charAt(i); int codePoint = isHighSurrogate(ch) ? codePointAt(source, i++) : ch; if (doEncode(codePoint)) { target.append("&#x").append(toHexString(codePoint)).append(";"); } else { target.append(ch); } } return target.toString(); } private boolean doEncode(int codePoint) { for (CodePointSet set : sets) { if (set.contains(codePoint)) { return true; } } return false; } public static interface CodePointSet { public boolean contains(int codePoint); } private static final class GreaterThan implements CodePointSet { private final int value; public GreaterThan(int value) { this.value = value; } @Override public boolean contains(int codePoint) { return codePoint > value; } } private static final class Special implements CodePointSet { private final int[] set; private Special(int... sortedArray) { this.set = sortedArray; } @Override public boolean contains(int codePoint) { return Arrays.binarySearch(set, codePoint) >= 0; } } }
Here's the code being used in a JUnit unit test:
@Test public void testEncoder() { String data = "> x \u2260 y"; String encoded = new CharReferences(CharReferences.NOT_ISO8859_1, CharReferences.HTML).encode(data); Assert.assertEquals("> x ≠ y", encoded); }
For the input string "> x ≠ y"
the method returns "> x ≠ y"
.
Refactoring with lambdas
Here is the CharReferences
type reimplemented with lambda functionality:
package demo.java8; import java.util.Arrays; import java.util.function.IntPredicate; import java.util.stream.Collectors; public final class CharReferences { private static final int[] SORTED_SPECIAL = {'\"', '&', '\'', '<', '>'}; public static final IntPredicate HTML = (int cp) -> Arrays.binarySearch(SORTED_SPECIAL, cp) >= 0; public static final IntPredicate ASCII = (int n) -> n <= 127; public static final IntPredicate ISO8859_1 = (int n) -> n <= 255; private final IntPredicate doEncode; public CharReferences(IntPredicate doEncode) { this.doEncode = doEncode; } public String encode(CharSequence source) { return source.codePoints().mapToObj( (int codePoint) -> { if (doEncode.test(codePoint)) { return String.format("&#x%x;", codePoint); } else { return new String(Character.toChars(codePoint)); } }).collect(Collectors.joining()); } }
This code utilizes the java.util.function
and java.util.stream packages to process character data
via a new default
method added to CharSequence.
The updated unit test:
@Test public void testEncoder() { String data = "> x \u2260 y"; IntPredicate htmlOrNotIso = CharReferences.ISO8859_1 .negate() .or(CharReferences.HTML); String encoded = new CharReferences(htmlOrNotIso).encode(data); Assert.assertEquals("> x ≠ y", encoded); }
The lambda implementation is less efficient but it is shorter and more intuitive.
No comments:
Post a Comment
All comments are moderated