The JDK 8 Developer Preview came out recently so I thought I would take a look at the API enhancements. This post demonstrates how refactoring to utilize the new types can change code.
This post refers to JDK build 1.8.0-ea-b106. IntelliJ IDEA has good Java 8 syntax support.
A HTML character reference encoder
This type encodes code points in a given set as character references. It could be used in situations where you were emitting character data in a non-Unicode encoding (not that you would want to.)
package demo.java7;
import java.util.Arrays;
import static java.lang.Character.codePointAt;
import static java.lang.Character.isHighSurrogate;
import static java.lang.Integer.toHexString;
public final class CharReferences {
public static final CodePointSet HTML = new Special('\"', '&', '\'', '<', '>');
public static final CodePointSet NOT_ASCII = new GreaterThan(127);
public static final CodePointSet NOT_ISO8859_1 = new GreaterThan(255);
private final CodePointSet[] sets;
public CharReferences(CodePointSet... doEncode) {
this.sets = doEncode;
}
public String encode(CharSequence source) {
StringBuilder target = new StringBuilder();
for (int i = 0, len = source.length(); i < len; i++) {
char ch = source.charAt(i);
int codePoint = isHighSurrogate(ch) ? codePointAt(source, i++) : ch;
if (doEncode(codePoint)) {
target.append("&#x").append(toHexString(codePoint)).append(";");
} else {
target.append(ch);
}
}
return target.toString();
}
private boolean doEncode(int codePoint) {
for (CodePointSet set : sets) {
if (set.contains(codePoint)) {
return true;
}
}
return false;
}
public static interface CodePointSet {
public boolean contains(int codePoint);
}
private static final class GreaterThan implements CodePointSet {
private final int value;
public GreaterThan(int value) {
this.value = value;
}
@Override
public boolean contains(int codePoint) {
return codePoint > value;
}
}
private static final class Special implements CodePointSet {
private final int[] set;
private Special(int... sortedArray) {
this.set = sortedArray;
}
@Override
public boolean contains(int codePoint) {
return Arrays.binarySearch(set, codePoint) >= 0;
}
}
}
Here's the code being used in a JUnit unit test:
@Test
public void testEncoder() {
String data = "> x \u2260 y";
String encoded = new CharReferences(CharReferences.NOT_ISO8859_1, CharReferences.HTML).encode(data);
Assert.assertEquals("> x ≠ y", encoded);
}
For the input string "> x ≠ y" the method returns "> x ≠ y".
Refactoring with lambdas
Here is the CharReferences type reimplemented with lambda functionality:
package demo.java8;
import java.util.Arrays;
import java.util.function.IntPredicate;
import java.util.stream.Collectors;
public final class CharReferences {
private static final int[] SORTED_SPECIAL = {'\"', '&', '\'', '<', '>'};
public static final IntPredicate HTML = (int cp) -> Arrays.binarySearch(SORTED_SPECIAL, cp) >= 0;
public static final IntPredicate ASCII = (int n) -> n <= 127;
public static final IntPredicate ISO8859_1 = (int n) -> n <= 255;
private final IntPredicate doEncode;
public CharReferences(IntPredicate doEncode) {
this.doEncode = doEncode;
}
public String encode(CharSequence source) {
return source.codePoints().mapToObj(
(int codePoint) -> {
if (doEncode.test(codePoint)) {
return String.format("&#x%x;", codePoint);
} else {
return new String(Character.toChars(codePoint));
}
}).collect(Collectors.joining());
}
}
This code utilizes the java.util.function
and java.util.stream packages to process character data
via a new default method added to CharSequence.
The updated unit test:
@Test
public void testEncoder() {
String data = "> x \u2260 y";
IntPredicate htmlOrNotIso = CharReferences.ISO8859_1
.negate()
.or(CharReferences.HTML);
String encoded = new CharReferences(htmlOrNotIso).encode(data);
Assert.assertEquals("> x ≠ y", encoded);
}
The lambda implementation is less efficient but it is shorter and more intuitive.
No comments:
Post a Comment
All comments are moderated