Tuesday, 5 August 2008

Java: int versus Integer

Changes in the Java language have made the differences between int and java.lang.Integer less obvious but every Java developer should understand them. Unless otherwise stated, Java 7 syntax and types are used.

Many of these issues apply to all the primitive types and their wrapper types.

This post has been rewritten in 2013. The original post was still generating comments (not the good kind) five years after it was written. This post is more detailed and provides better examples. Old comments have been deleted to avoid confusion - new criticism is welcome.

TL;DR

These paragraphs contain the essential points:

The remainder provide proofs or are informational.

History

Before Java 5 introduced autoboxing/autounboxing, the addition of two Integer types would look something like this:

public class Java14Math {
  public static void main(String[] args) {
    Integer x = new Integer(1);
    Integer y = new Integer(2);
    int _x = x.intValue();
    int _y = y.intValue();
    Integer xy = new Integer(_x + _y);
    System.out.println(xy);
  }
}

This code uses an object wrapper type to perform addition and prints the result.

Java 5 autoboxing reduced verbosity by offloading complexity onto the compiler:

    Integer a = 1;
    Integer b = 2;
    Integer ab = a + b;

As we will see, autoboxing is a mixed blessing.

Josh Bloch said in his Devoxx'11 critique The Evolution of Java: Past, Present, and Future:

[Autoboxing] caused more puzzlers than any other JSR 201 change ... we should have made generics work for primitives instead ... we never should have done this.

I've liberally lifted examples from this and other sources.

The int type

Declaration:

 int foo;
  • foo is a primitive
  • foo stores 32 bits of information (in the range Integer.MIN_VALUE to Integer.MAX_VALUE)
  • Literal integers (e.g. 123 or 0x7b) are of type int

The Integer type

Declaration:

 Integer bar;
  • bar is an object reference
  • bar points to an object of type java.lang.Integer (or to null)
  • The object bar points at has an int member variable as described above

Choosing between int and Integer

I'll start with how these types should be used before going into detail on why.

  • Prefer int for performance reasons
  • Methods that take objects (including generic types like List<T>) will implicitly require the use of Integer
  • Use of Integer is relatively cheap for low values (-128 to 127) because of interning - use Integer.valueOf(int) and not new Integer(int)
  • Do not use == or != with Integer types
  • Consider using Integer when you need to represent the absence of a value (null)
  • Beware unboxing Integer values to int with null values

int and Integer are not interchangeable

Type declarations

You can't declare a list of type List<int> because generic types must be of type Object :

import java.util.Arrays;
import java.util.List;

public class ListOfNumbers {
  public static void main(String[] args) {
    List<Integer> list = Arrays.asList(4, 8, 15, 16, 23, 42);
    System.out.println(list);
  }
}
Type matching

Beware matching rules when mixing int and Integer.

Consider this code:

      Integer four = 4;
      List<Integer> nums = new ArrayList<>(asList(4, 8, 15, 16, 23, 42));
      nums.remove(four);
      System.out.println(nums);

It might seem resonable to save a line of code and write:

      List<Integer> nums = new ArrayList<>(asList(4, 8, 15, 16, 23, 42));
      nums.remove(4);
      System.out.println(nums);

However, the second snippet invokes a different method.

This is also an example of poor method overloading design - two methods with the same name that do different things.

To match the same method the code could be written as nums.remove(Integer.valueOf(4)) or nums.remove((Integer) 4).

Here is a complete listing where the first block removes the value 23 at index 4 while the second removes 4 at index 0:

import static java.util.Arrays.asList;

import java.util.ArrayList;
import java.util.List;

public class ListGotchas {
  public static void main(String[] args) {
    {
      int four = 4;
      List<Integer> nums = new ArrayList<>(asList(4, 8, 15, 16, 23, 42));
      nums.remove(four); // matches List.remove(int)
      System.out.println(nums);
    }
    {
      Integer four = 4;
      List<Integer> nums = new ArrayList<>(asList(4, 8, 15, 16, 23, 42));
      nums.remove(four); // matches List.remove(Object)
      System.out.println(nums);
    }
  }
}
Equality

Here is a listing where the assertEquals method throws an error if two int values are not equal:

public class IntEquality {
  public static void assertEquals(int i1, int i2) {
    if (i1 != i2) {
      throw new AssertionError(i1 + " " + i2);
    }
  }

  public static void main(String[] args) {
    assertEquals(1, 1);
    assertEquals(1000, 1000);
    System.out.println("OK");
  }
}

This code prints OK.

Consider the requirement that the method also support null == null for Integer type references. It is not enough to switch the types:

public class BadIntegerEquality {
  /** @deprecated method is broken because it uses identity equality */
  @Deprecated
  public static void assertEquals(Integer i1, Integer i2) {
    if (i1 != i2) {
      throw new AssertionError(i1 + " " + i2);
    }
  }

  public static void main(String[] args) {
    assertEquals(1, 1);
    assertEquals(1000, 1000);
    System.out.println("OK");
  }
}

The above code succeeds for assertEquals(1, 1) but throws the AssertionError for assertEquals(1000, 1000).

Let's break down what is happening:

  • Autoboxing of an int literal causes invocation of Integer.valueOf(int)
  • The value of an Integer variable is a memory address
  • The value 1 lies in the range of values cached by valueOf
  • The value 1000 is not cached and causes the invocation of new Integer(1000)
  • The two objects containing value 1000 reside at different memory addresses

Simple analogy using two 20 cent coins:

two 20 cent coins

These two coins represent the same value but they are not the same coin. The value of a variable of type Integer would be the location of one coin on the table.

The equals method must be used to compare the equality of two objects' internal state. Here is object equality testing with a null check:

public class IntegerEquality {
  /**
   * Equals with null check:
   * <pre>
   *  if (i1 == null) {
   *    return i2 == null;
   *  } else {
   *    return i1.equals(i2);
   *  }
   * </pre>
   */
  public static boolean areEqual(Object i1, Object i2) {
    return (i1 == null) ? (i2 == null) : i1.equals(i2);
  }

  public static void assertEquals(Integer i1, Integer i2) {
    if (!areEqual(i1, i2)) {
      throw new AssertionError(i1 + " " + i2);
    }
  }

  public static void main(String[] args) {
    assertEquals(1, 1);
    assertEquals(1000, 1000);
    assertEquals(null, null);
    System.out.println("OK");
  }
}
Autoboxing and the ternary operator

Caution is required when using the ternary operator with boxed types and null. From the language specification:

If one of the second and third operands is of primitive type T, and the type of the other is the result of applying boxing conversion to T, then the type of the conditional expression is T.

That is, for expression types boolean ? int : Integer or boolean ? Integer : int the Integer is always unboxed to int.

This listing contains a getValueOrZero method that attempts to get a value from a thread-safe reference that may itself be null:

import java.util.concurrent.atomic.AtomicReference;

public class AutoboxingAndTernaryOperator {
  /** @deprecated throws NPE if ref is empty */
  @Deprecated
  public static int getValueOrZero(AtomicReference<Integer> ref) {
    Integer stored = (ref == null) ? 0 : ref.get();
    return (stored == null) ? 0 : stored;
  }

  public static void main(String[] args) {
    AtomicReference<Integer> empty = new AtomicReference<>();
    System.out.println(getValueOrZero(empty));
  }
}

A NullPointerException is thrown on the first line if the AtomicReference contains a null value.

This can be fixed by changing the expression to (ref == null) ? (Integer) 0 : ref.get();.

Verification of behaviour by inspection

public class Sums {
  public static Integer sum(Integer... series) {
    Integer result = 0;
    for (Integer i : series) {
      result += i;
    }
    return result;
  }
}

We can inspect the above method using the JDK's javap tool.

 public static java.lang.Integer sum(java.lang.Integer...);
   Code:
      0: iconst_0
      1: invokestatic  #24                 // Method java/lang/Integer.valueOf:(I)Ljava/lang/Integer;
      4: astore_1
      5: aload_0
      6: dup
      7: astore        5
      9: arraylength
     10: istore        4
     12: iconst_0
     13: istore_3
     14: goto          38
     17: aload         5
     19: iload_3
     20: aaload
     21: astore_2
     22: aload_1
     23: invokevirtual #30                 // Method java/lang/Integer.intValue:()I
     26: aload_2
     27: invokevirtual #30                 // Method java/lang/Integer.intValue:()I
     30: iadd
     31: invokestatic  #24                 // Method java/lang/Integer.valueOf:(I)Ljava/lang/Integer;
     34: astore_1
     35: iinc          3, 1
     38: iload_3
     39: iload         4
     41: if_icmplt     17
     44: aload_1
     45: areturn

The results tell us that autoboxing is syntactic sugar hiding method invocations in the compiled code.

Performance

To compare the performance of int and Integer I wrote a microbenchmark using Google's Caliper framework.

Caveats:

  • Caliper is still in development and I used a SNAPSHOT build I compiled myself (rev f6ebf866113c.)
  • Be aware of the limitations of microbenchmarks

I wrote two implementations of the Adler-32 algorithm:

public class ChecksumDemo {
  public static int intAdler32(byte[] data) {
    int a = 1;
    int b = 0;
    for (byte element : data) {
      a = (a + unsignedByte(element)) % 65521;
      b = (b + a) % 65521;
    }
    return (b << 16) | a;
  }

  public static Integer integerAdler32(byte[] data) {
    Integer a = 1;
    Integer b = 0;
    for (Byte element : data) {
      a = (a + unsignedByte(element)) % 65521;
      b = (b + a) % 65521;
    }
    return (b << 16) | a;
  }

  private static int unsignedByte(byte b) {
    return b & 0xFF;
  }
}

Java already has a Adler32 implementation; I just needed an algorithm.

Here are the Caliper results:

PARAMETERS DATA METHODNAME RUNTIME (NS) BYTES (B) OBJECTS
1 testInt 968.937 0.000 0.000
1 testInteger 2,154.023 2,048.000 128.000
2 testInt 1,947.981 0.000 0.000
2 testInteger 4,973.421 4,000.000 250.000
4 testInt 3,917.377 0.000 0.000
4 testInteger 9,987.991 8,192.000 512.000

From these results we can say that for this benchmark Integer consumes more time and memory and under similar conditions could add load to garbage collection due to heap (object) allocations. It is reasonable to prefer int implementations where possible.

For completeness, here is the benchmark code though I wouldn't be surprised if it fails to compile on a release version of Caliper (it won't work on beta-1 for example.)

import java.nio.charset.StandardCharsets;
import java.security.SecureRandom;

import com.google.caliper.Benchmark;
import com.google.caliper.Param;

public class ChecksumBenchmark {
  @Param({ "1", "2", "4" })
  private RandomData data;

  @Benchmark
  public int testInt(int reps) {
    byte[] data = this.data.data;
    int result = 0;
    for (int i = 0; i < reps; i++) {
      result |= ChecksumDemo.intAdler32(data);
    }
    return result;
  }

  @Benchmark
  public int testInteger(int reps) {
    byte[] data = this.data.data;
    int result = 0;
    for (int i = 0; i < reps; i++) {
      result |= ChecksumDemo.integerAdler32(data);
    }
    return result;
  }

  public static class RandomData {
    private byte[] data;

    public static RandomData fromString(String param) {
      int size = Integer.parseInt(param) * 64;
      RandomData holder = new RandomData();
      holder.data = new byte[size];
      new SecureRandom(param.getBytes(StandardCharsets.US_ASCII))
          .nextBytes(holder.data);
      return holder;
    }
  }
}

No comments: