2. Bytecode compilation at compile-time

What is bytecode and how does it work? What does Scala code look like when compiled to bytecode?

First, why do we even care about bytecode?

Don’t expect to have to look at bytecode too frequently. Usually (hopefully) we can trust that the compiler has transformed your code into bytecode correctly. Most developers won’t ever have to get into detail of the generated bytecode, and that’s a good thing – that’s why we have higher-level languages!

But it does give some context as to what’s happening, and having this context can give you an understanding of why certain optimisations are possible or not, and why some code runs blazingly fast while some limps along.

Bytecode is for a stack-based machine

Bytecode is the first intermediate step of the compilers that lead to the CPU. It’s a language describing your code in a platform-independent way, in particular for a fictional platform.

It describes the execution of your program on a stack-based machine, as opposed to the register-based processors we’re used to. Values are put on a stack; functions use those values as parameters and replace them with results.

It’s not particularly efficient, but it’s not supposed to be. We’re still quite a long way from the metal.

Here’s an example of the steps of execution of a short code snippet:

"my favourite number is: " + (1 + 2)
  1. first, the two integer values, 1 and 2, are added to the stack.
  2. the iadd instruction to add two integers is invoked, and the result remains on the stack.
  3. next we need to convert this integer 3 into a string that can be concatenated with the longer prefix. The instruction here is invokestatic String.valueOf, which invokes this method with the parameter 3 and leaves the string "3" on the stack.
  4. finally, add the longer string "my favourite number is: " and run the instruction invokevirtual String.concat. The expected result is left on the stack.

Note the static and virtual instructions – these have the same meaning as described above. The virtual call is because the String concatenation method “belongs” to the string on which it is called, and the static call has no such instance.

Reading a classfile

javap -p -c <class-name>
object ScalaConstants {
  val ichBinEinConstant = "some string"
  def ichBinEinUtilityFunction(param: Int) = param.toString
}
Compiled from "ScalaConstants.scala"
public final class ScalaConstants$ {
  public static final ScalaConstants$ MODULE$;
  private final java.lang.String ichBinEinConstant;
  public static {};
  public java.lang.String ichBinEinConstant();
  public java.lang.String ichBinEinUtilityFunction(int);
  private ScalaConstants$();
}

public final class ScalaConstants {
  public static java.lang.String ichBinEinUtilityFunction(int);
  public static java.lang.String ichBinEinConstant();
}

The JDK ships with a disassembler app called javap that can display bytecode in a somewhat human-readable form.

We’re going to use a trivial example to look at some bytecode. This example ScalaConstants contains a constant value and a utility function in an object, which is Scala’s implementation of a singleton. Below it is the bytecode as shown by javap -p ScalaConstants, just the type signatures with no disassembled code, for now.

Notice first that there are two classes, one with a $ appended to the name. This is synthetically generated by the Scala compiler, and is how singleton objects (and companion objects) are implemented. This is a sort of hidden type – it’s accessed only through the main type ScalaConstants that’s declared in source.

Lines with parentheses represent methods, lines without represent fields. The public static {} is the class’s static initialiser that runs when the class is loaded. In this case, when the static initialiser is called the constructor is run, and a new instance saved in the MODULE$ field. This is the globally-visibile singleton instance.

Calling a method on an object

What happens when a method is invoked? Suppose we want to see what happens for ScalaConstants.ichBinEinUtilityFunction(3). Let’s look at the bytecode for the utility function, which we can get using javap -c ScalaConstants.

public final class ScalaConstants {
  public static java.lang.String ichBinEinUtilityFunction(int);
    Code:
       0: getstatic     #16  // Field ScalaConstants$.MODULE$:LScalaConstants$;
       3: iload_0
       4: invokevirtual #18  // Method ScalaConstants$.ichBinEinUtilityFunction:...
       7: areturn
}
  1. load the singleton ScalaConstants$ object from the static field called MODULE$ onto the stack
  2. load the integer parameter onto the stack
  3. invoke the instance method which, yes, has the same name as this function
  4. return the String reference

What about inside the delegate function? From javap -c ScalaConstants$:

public final class ScalaConstants$ {
  public java.lang.String ichBinEinUtilityFunction(int);
    Code:
       0: iload_1
       1: invokestatic  #26  // Method scala/runtime/BoxesRunTime.boxToInteger:(I)Ljava/lang/Integer;
       4: invokevirtual #29  // Method java/lang/Object.toString:()Ljava/lang/String;
       7: areturn
}

First note that this one doesn’t have a static modifier – it’s an instance method on the singleton object.

  1. load the integer parameter onto the stack.
  2. turn the int primitive into an object.
  3. invoke the toString method on the new object. This is a virtual call because it’s a non-static method, called on an instance.
  4. return the string reference.

The stack holds parameters to the method, as we saw. When calling a method on an instance, a non-static method, the zeroth parameter is this. You’ll notice the difference when loading the int parameter in these last two examples, iload_1 instead of iload_0.

Initialising a singleton object

public final class ScalaConstants$ {
  public static final ScalaConstants$ MODULE$;
  private final java.lang.String ichBinEinConstant;

  public static {};
    Code:
       0: new           #2   // class ScalaConstants$
       3: invokespecial #12  // Method "<init>":()V
       6: return

  private ScalaConstants$();
    Code:
       0: aload_0
       1: invokespecial #28  // Method java/lang/Object."<init>":()V
       4: aload_0
       5: putstatic     #30  // Field MODULE$:LScalaConstants$;
       8: aload_0
       9: ldc           #32  // String some string
      11: putfield      #17  // Field ichBinEinConstant:Ljava/lang/String;
      14: return
}

One more example – how does the singleton itself get initialised? Below is the initialisation code from the disassembled ScalaConstants$ class.

static {} is the static initialiser that is run when the class is loaded. The code here creates the singleton object, invokes its constructor and returns.

Let’s walk through the constructor of the class, private ScalaConstants$():

That’s enough of an introduction to bytecode. The language compilers scalac and javac that compile to bytecode do some optimisation, including any language-specific things – an often-cited example is transformation of string concatenation like "a" + "b" + "c" into a StringBuilder expression, which is much more efficient by saving repeated copying.

Most of the heavy lifting is done later, by the Just-in-Time compilers at runtime.