Scala 3 Metaprogramming - Inline

Exploring “inline”

This page is adapted from a talk I gave at Scala in the City, July 29, 2021.

The following examples are adapted from “Programming Scala, Third Edition”:

What’s All This About Then?

inline is part of the new metaprogramming facilities in Scala 3. It forces the compiler to “inline” the marked code, if feasible:

Method bodies are expanded in place, removing the method invocation overhead.
Conditionals are replaced with the branch that would be taken.
- This means the predicate value must be determined at compile time.

Let’s see some examples. Here is an inlined method:

// src/script/scala/progscala3/meta/inline/Recursive.scala

inline def repeat(s: String, count: Int): String =
  if count == 0 then ""
  else s + repeat(s, count-1)

Let’s try it and see what happens:

repeat("hello", 3)    // Okay

val n=3
repeat("hello", n)    // ERROR!

The error is:

1 |time(repeat("hello",100_000))
  |     ^^^^^^^^^^^^^^^^^^^^^^^
  |     Maximal number of successive inlines (32) exceeded,
  |     Maybe this is caused by a recursive inline method?
  |     You can use -Xmax-inlines to change the limit.
  | This location contains code that was inlined from rs$line$8:1
  | This location contains code that was inlined from rs$line$1:3
  |...
  | This location contains code that was inlined from rs$line$1:3
  | This location contains code that was inlined from rs$line$1:3

Note that the maximum number of successive inlines allowed defaults to 32. Even though it is configurable, after a certain point the gains from elimination of method invocations will be replaced by byte code bloat…

However, we can fix this error as follows:

inline val n = 3
repeat("hello", n)

Even if we’re limited to 32, does it noticeably affect performance? Here’s the same method, non-inlined:

def repeatNI(s: String, count: Int): String =   // "NI" for "not inlined"
  if count == 0 then ""
  else s + repeatNI(s, count-1)

Here’s an inlined timer method:

def time[R](f: => R): R =
  val startNanos = java.lang.System.nanoTime
  val r = f
  val endNanos = java.lang.System.nanoTime
  println(s"${endNanos - startNanos}ns")
  r

Running on my relatively recent MacBook Pro 16” laptop (Intel, not an M1 ;^):

scala> time(repeat("hello",31))
4535ns
val res4: String = hellohellohellohellohellohellohellohellohellohellohellohellohellohellohellohellohellohellohellohellohellohellohellohellohellohellohellohellohellohellohello

scala> time(repeatNI("hello",31))
12042ns
val res5: String = hellohellohellohellohellohellohellohellohellohellohellohellohellohellohellohellohellohellohellohellohellohellohellohellohellohellohellohellohellohellohello

About 2.6 times faster, but really, this can’t be trusted too much. If you run these line repeatedly, you’ll see a lot of variance. We also have the JVM “Hot Spotting” things for us…

But wait, there’s more! We can inline the conditional:

// src/script/scala/progscala3/meta/inline/ConditionalMatch.scala

inline def repeat2(s: String, count: Int): String =
  inline if count == 0 then ""
  else s + repeat2(s, count-1)

repeat2("hello", 3)    // Okay
val n = 3
repeat2("hello", n)    // ERROR!

Now, an error is reported first while compiling the conditional:

1 |repeat2("hello", n)    // ERROR!
  |^^^^^^^^^^^^^^^^^^^
  |Cannot reduce `inline if` because its condition is not a constant value: n.==(0)
  | This location contains code that was inlined from rs$line$18:2

What does it mean to inline the conditional? It means that the byte code inserted will correspond to either the true or false branch, but the conditional test and both branches will not be in the byte code.

To be clear, when we call repeat2("hello", 3), it’s basically as if we wrote the following code instead, where I added parentheses () to show each expansion:

"hello" + ("hello" + ("hello" + ""))

Whereas for repeat, the inserted byte code would have the conditional expression inserted three times.

Perhaps you’ve done a similar exercise yourself on paper while learning how a recursive function works??

Let’s try our timing test again:

scala> time(repeat("hello",31))
4455ns    # roughly the same as last time...

scala> time(repeat2("hello",31))
4399ns

Not much difference in this case. Again, there is lots of variance in the actual run times.

Finally, we can inline conditionals:

inline def repeat3(s: String, count: Int): String =
  inline count match
    case 0 => ""
    case _ => s + repeat3(s, count-1)

I’ll let you play with this yourself. Would you expect the performance to be better, worse, or about the same compared to repeat2?

Don’t abuse this feature. Besides being limited to calling repeat with compile-time constants, you’ll create code bloat and may not always get performance improvements.

A Non-trivial Example

Let’s write a function that optionally checks that an invariant condition is preserved by a method call. If you’ve heard of Design by Contract, this might be something you’ve thought about.

Basically, I want an easy way to test a condition before and after some operation, confirming that it is satisfied both times, but I also want to “compile it away” before I go to production. Oh, and I also want a pony…

This example also uses quoting and splicing, the building blocks for macros in the new metaprogramming system. I explain this example in this Scala 3 blog post:

// src/main/scala/progscala3/meta/Invariant.scala
package progscala3.meta
import scala.quoted.*    

object invariant:
  // Compile-time constant to enable or disable checking
  inline val ignore = false    

  // Test the predicate before and after evaluating the block.
  inline def apply[T](
      inline predicate: => Boolean, message: => String = "")( 
      inline block: => T): T =
    // From what you now know, what does the compiler output for byte code?
    inline if !ignore then
      if !predicate then fail(predicate, message, block, "before")
      val result = block
      if !predicate then fail(predicate, message, block, "after")
      result
    else
      block

  // Use inline to insert the definition at compile time.
  // The splice ${...} is inserted in the byte code.
  inline private def fail[T](
      inline predicate: => Boolean,
      inline message: => String,
      inline block: => T,
      inline beforeAfter: String): Unit =
    ${ failImpl('predicate, 'message, 'block, 'beforeAfter) }   

  case class InvariantFailure(msg: String) extends RuntimeException(msg)

  // Note the quote '{...} and the argument types
  private def failImpl[T](
      predicate: Expr[Boolean], message: Expr[String],
      block: Expr[T], beforeAfter: Expr[String])(
      using Quotes): Expr[String] =
    '{ throw InvariantFailure(  
      s"""FAILURE! predicate "${${showExpr(predicate)}}" """
      + s"""failed ${$beforeAfter} evaluation of block:"""
      + s""" "${${showExpr(block)}}". Message = "${$message}". """)
    }

  private def showExpr[T](expr: Expr[T])(using Quotes): Expr[String] =
    val code: String = expr.show 
    Expr(code)

When ignore is true, the entire implementation of apply reduces to block! If ignore is false, then the following code is compiled:

      if !predicate then fail(predicate, message, block, "before")
      val result = block
      if !predicate then fail(predicate, message, block, "after")
      result

The benefit of using a macro implementation is that the InvariantFailure message string will contain the code for predicate and block, so it’s easier to see what failed. Here’s an example of what happens if you use this in sbt console for the Programming Scala code examples, where this code is already compiled:

scala> import progscala3.meta.*

scala> var okay = true    // Something we'll use in the predicate and block
var okay: Boolean = true

scala> invariant(okay == true) {
     |   "hello world!"
     | }
val res0: String = hello world!   // works fine

scala> invariant(okay == true) {
     |   okay = false
     |   "hello world!"
     | }
progscala3.meta.invariant$InvariantFailure: FAILURE! predicate "repl.rs$line$40.okay.==(true)" failed after evaluation of block: "{
  repl.rs$line$40.okay = false
  "hello world!"
}". Message = "".
  at progscala3.meta.invariant$InvariantFailure$.apply(Invariant.scala:26)
  ...

Note the error message contains the predicate, okay.==(true) (omitting the REPL prefix and after converting the operator notation syntactic sugar to a regular method call). It also contains the block that caused the error.

See the blog post for more details.

Overrides

If we have time, let’s discuss a few subtleties.

There’s some subtle behavior with method definitions and overrides, where declaring the abstract method inline behaves differently than overriding “normal” methods, but making the overrides inline. First, the latter case:

// src/script/scala/progscala3/meta/inline/Overrides.scala

trait T:
  def m1: String
  def m2: String = m1

object O extends T:
  inline def m1 = "O.m1"
  override inline def m2 = m1 + " called from O.m2"

val t: T = O
assert(O.m1 == t.m1)
assert(O.m2 == t.m2)

Now notice what happens if the parent’s abstract method is inline:

trait T2:
  inline def m: String

object O2 extends T2:
  inline def m: String = "O2.m"

val t2: T2 = O2
O2.m
t2.m       // ERROR

The last line prints:

1 |t2.m
  |^^^^
  |Deferred inline method m in trait T2 cannot be invoked

Transparent Inline

This is non-obvious, but notice what happens if you add transparent as well as inline:

// src/script/scala/progscala3/meta/inline/Transparent.scala

open class C1
class C2 extends C1:
  def hello = "hello from C2"

transparent inline def make(b: Boolean): C1 = if b then C1() else C2()

val c1: C1 = make(true)                 // <1>
// c1.hello                             // C1.hello doesn't exist!
val c2: C2 = make(false)                // <2>
c2.hello                                // Allowed!

Even though make is declared to return a C1, the compiler actually knows that a C2 instance is returned in the second case, so it allows us to assign the returned value to c2, of type C2! While convenient, this technically breaks the usual “contract” for method return types and assignments. Normally, it would be a compilation error for c2 to be of type C2, so I’m not sure it’s really good idea to use this feature, except in rare cases.