The JDK 8 Developer Preview came out recently so I thought I would take a look at the API enhancements. This post demonstrates how refactoring to utilize the new types can change code.
This post refers to JDK build 1.8.0-ea-b106. IntelliJ IDEA has good Java 8 syntax support.
A HTML character reference encoder
This type encodes code points in a given set as character references. It could be used in situations where you were emitting character data in a non-Unicode encoding (not that you would want to.)
package demo.java7;
import java.util.Arrays;
import static java.lang.Character.codePointAt;
import static java.lang.Character.isHighSurrogate;
import static java.lang.Integer.toHexString;
public final class CharReferences {
  public static final CodePointSet HTML = new Special('\"', '&', '\'', '<', '>');
  public static final CodePointSet NOT_ASCII = new GreaterThan(127);
  public static final CodePointSet NOT_ISO8859_1 = new GreaterThan(255);
  private final CodePointSet[] sets;
  public CharReferences(CodePointSet... doEncode) {
    this.sets = doEncode;
  }
  public String encode(CharSequence source) {
    StringBuilder target = new StringBuilder();
    for (int i = 0, len = source.length(); i < len; i++) {
      char ch = source.charAt(i);
      int codePoint = isHighSurrogate(ch) ? codePointAt(source, i++) : ch;
      if (doEncode(codePoint)) {
        target.append("&#x").append(toHexString(codePoint)).append(";");
      } else {
        target.append(ch);
      }
    }
    return target.toString();
  }
  private boolean doEncode(int codePoint) {
    for (CodePointSet set : sets) {
      if (set.contains(codePoint)) {
        return true;
      }
    }
    return false;
  }
  public static interface CodePointSet {
    public boolean contains(int codePoint);
  }
  private static final class GreaterThan implements CodePointSet {
    private final int value;
    public GreaterThan(int value) {
      this.value = value;
    }
    @Override
    public boolean contains(int codePoint) {
      return codePoint > value;
    }
  }
  private static final class Special implements CodePointSet {
    private final int[] set;
    private Special(int... sortedArray) {
      this.set = sortedArray;
    }
    @Override
    public boolean contains(int codePoint) {
      return Arrays.binarySearch(set, codePoint) >= 0;
    }
  }
}
Here's the code being used in a JUnit unit test:
  @Test
  public void testEncoder() {
    String data = "> x \u2260 y";
    String encoded = new CharReferences(CharReferences.NOT_ISO8859_1, CharReferences.HTML).encode(data);
    Assert.assertEquals("> x ≠ y", encoded);
  }
For the input string "> x ≠ y" the method returns "> x ≠ y".
Refactoring with lambdas
Here is the CharReferences type reimplemented with lambda functionality:
package demo.java8;
import java.util.Arrays;
import java.util.function.IntPredicate;
import java.util.stream.Collectors;
public final class CharReferences {
  private static final int[] SORTED_SPECIAL = {'\"', '&', '\'', '<', '>'};
  public static final IntPredicate HTML = (int cp) -> Arrays.binarySearch(SORTED_SPECIAL, cp) >= 0;
  public static final IntPredicate ASCII = (int n) -> n <= 127;
  public static final IntPredicate ISO8859_1 = (int n) -> n <= 255;
  private final IntPredicate doEncode;
  public CharReferences(IntPredicate doEncode) {
    this.doEncode = doEncode;
  }
  public String encode(CharSequence source) {
    return source.codePoints().mapToObj(
        (int codePoint) -> {
          if (doEncode.test(codePoint)) {
            return String.format("&#x%x;", codePoint);
          } else {
            return new String(Character.toChars(codePoint));
          }
        }).collect(Collectors.joining());
  }
}
This code utilizes the java.util.function
    and java.util.stream packages to process character data
    via a new default method added to CharSequence.
The updated unit test:
  @Test
  public void testEncoder() {
    String data = "> x \u2260 y";
    IntPredicate htmlOrNotIso = CharReferences.ISO8859_1
        .negate()
        .or(CharReferences.HTML);
    String encoded = new CharReferences(htmlOrNotIso).encode(data);
    Assert.assertEquals("> x ≠ y", encoded);
  }
The lambda implementation is less efficient but it is shorter and more intuitive.
 
 
 Posts
Posts
 
 
No comments:
Post a Comment
All comments are moderated