Saturday, 14 September 2013

Java: lambdas, streams and functions (Java 8 pre-release)

The JDK 8 Developer Preview came out recently so I thought I would take a look at the API enhancements. This post demonstrates how refactoring to utilize the new types can change code.

This post refers to JDK build 1.8.0-ea-b106. IntelliJ IDEA has good Java 8 syntax support.

A HTML character reference encoder

This type encodes code points in a given set as character references. It could be used in situations where you were emitting character data in a non-Unicode encoding (not that you would want to.)

package demo.java7;

import java.util.Arrays;

import static java.lang.Character.codePointAt;
import static java.lang.Character.isHighSurrogate;
import static java.lang.Integer.toHexString;

public final class CharReferences {
  public static final CodePointSet HTML = new Special('\"', '&', '\'', '<', '>');
  public static final CodePointSet NOT_ASCII = new GreaterThan(127);
  public static final CodePointSet NOT_ISO8859_1 = new GreaterThan(255);

  private final CodePointSet[] sets;

  public CharReferences(CodePointSet... doEncode) {
    this.sets = doEncode;
  }

  public String encode(CharSequence source) {
    StringBuilder target = new StringBuilder();
    for (int i = 0, len = source.length(); i < len; i++) {
      char ch = source.charAt(i);
      int codePoint = isHighSurrogate(ch) ? codePointAt(source, i++) : ch;
      if (doEncode(codePoint)) {
        target.append("&#x").append(toHexString(codePoint)).append(";");
      } else {
        target.append(ch);
      }
    }
    return target.toString();
  }

  private boolean doEncode(int codePoint) {
    for (CodePointSet set : sets) {
      if (set.contains(codePoint)) {
        return true;
      }
    }
    return false;
  }

  public static interface CodePointSet {
    public boolean contains(int codePoint);
  }

  private static final class GreaterThan implements CodePointSet {
    private final int value;

    public GreaterThan(int value) {
      this.value = value;
    }

    @Override
    public boolean contains(int codePoint) {
      return codePoint > value;
    }
  }

  private static final class Special implements CodePointSet {
    private final int[] set;

    private Special(int... sortedArray) {
      this.set = sortedArray;
    }

    @Override
    public boolean contains(int codePoint) {
      return Arrays.binarySearch(set, codePoint) >= 0;
    }
  }
}

Here's the code being used in a JUnit unit test:

  @Test
  public void testEncoder() {
    String data = "> x \u2260 y";
    String encoded = new CharReferences(CharReferences.NOT_ISO8859_1, CharReferences.HTML).encode(data);
    Assert.assertEquals("&#x3e; x &#x2260; y", encoded);
  }

For the input string "> x ≠ y" the method returns "&#x3e; x &#x2260; y".

Refactoring with lambdas

Here is the CharReferences type reimplemented with lambda functionality:

package demo.java8;

import java.util.Arrays;
import java.util.function.IntPredicate;
import java.util.stream.Collectors;

public final class CharReferences {

  private static final int[] SORTED_SPECIAL = {'\"', '&', '\'', '<', '>'};
  public static final IntPredicate HTML = (int cp) -> Arrays.binarySearch(SORTED_SPECIAL, cp) >= 0;
  public static final IntPredicate ASCII = (int n) -> n <= 127;
  public static final IntPredicate ISO8859_1 = (int n) -> n <= 255;

  private final IntPredicate doEncode;

  public CharReferences(IntPredicate doEncode) {
    this.doEncode = doEncode;
  }

  public String encode(CharSequence source) {
    return source.codePoints().mapToObj(
        (int codePoint) -> {
          if (doEncode.test(codePoint)) {
            return String.format("&#x%x;", codePoint);
          } else {
            return new String(Character.toChars(codePoint));
          }
        }).collect(Collectors.joining());
  }
}

This code utilizes the java.util.function and java.util.stream packages to process character data via a new default method added to CharSequence.

The updated unit test:

  @Test
  public void testEncoder() {
    String data = "> x \u2260 y";
    IntPredicate htmlOrNotIso = CharReferences.ISO8859_1
        .negate()
        .or(CharReferences.HTML);
    String encoded = new CharReferences(htmlOrNotIso).encode(data);
    Assert.assertEquals("&#x3e; x &#x2260; y", encoded);
  }

The lambda implementation is less efficient but it is shorter and more intuitive.

No comments:

Post a Comment

All comments are moderated