Tracing back Scala Future chains in asynchronous programming

Preface: synchronous vs asynchronous

In the era of synchronous programming every software developer used to be able to use stack traces to trace function invocation chains from the place an error occurred, back to the root cause of the problem. I suspect that most of us can relate to the process of debugging the code by burrowing through backtraces in log files. In the production environment, it often was the only way of finding the root cause of the bug.

Rules of the game have changed, however, with asynchronous programming becoming more popular. In asynchronous code, function invocations that form a logical chain in the source code are not confined to one thread, so stack traces are not of much help in tracing back the execution. In this blog post I’ll cover some ways of tackling this problem when using Scala’s `Future`s. Using `Future`s is not the only way to write asynchronous code, but this article will focus on this topic only.

The problem

To make this article somewhat more complete I feel obliged to describe what a stack trace really is and what is so different when using `Future`s. If you feel comfortable with your knowledge about this topic jump right to the solutions.

On the JVM, each thread has its own data area called a stack^[1]. Every time a method is invoked, its handle is pushed to the top of the stack. The handle is removed from the stack (popped) when the method returns. Nested method invocations cause the stack to contain all method handles that are part of the invocation chain. By default, every time an `Exception` is instantiated it saves a current thread’s stack trace which can then be inspected when the exception is caught^a.

Have a look at this small example:

object Sync extends App {
    def m1: String = "a" + m2
    def m2: String = "b" + m3
    def m3: String = throw new Exception

    m1
}

In the above program, after method `m1` is invoked, when `m3` is executing, the stack contains references to `m1`, `m2` and `m3` and the following is printed:

Exception in thread "main" java.lang.Exception
    at snippets.Sync$.m3(snippets.scala:6)
    at snippets.Sync$.m2(snippets.scala:5)
    at snippets.Sync$.m1(snippets.scala:4)
    at snippets.Sync$.delayedEndpoint$snippets$Sync$1(snippets.scala:8)
    ...the rest of internal calls

On the contrary, when using `Future`s, logical invocation chain is not executed by a single thread. Every time you use `map`, `flatMap` or other `Future` method that accepts an `ExecutionContext`, what you really do is you register a callback that is executed in the thread provided by the `ExecutionContext`. When an exception is thrown in the callback code it can’t access a full chain that could include all `map` and `flatMap` invocations.

object Async extends App {

    import scala.concurrent.ExecutionContext.Implicits.global
    import scala.concurrent.duration.Duration
    import scala.concurrent.{Await, Future}

    def m1: Future[String] = m2.map("a" + _)
    def m2: Future[String] = m3.map("b" + _)
    def m3: Future[String] = Future(throw new Exception)

    Await.result(m1, Duration.Inf)
}

Above, after method `m1` is invoked, when the exception is thrown in `m3` the stack trace contains neither a reference to `m1` nor to `m2` because invocations of these methods took place in different threads. In this case, the following is printed:

Exception in thread "main" java.lang.Exception
    at snippets.Async$.$anonfun$m3$1(snippets.scala:19)
    at scala.concurrent.Future$.$anonfun$apply$1(Future.scala:653)
    at scala.util.Success.$anonfun$map$1(Try.scala:251)
    at scala.util.Success.map(Try.scala:209)
    ...the rest of internal calls

The conclusion is that stack traces have an inherent limitation of tracing invocations of just one thread. Chapters below describe how to overcome this limitation.

Compile-time approach

One of the solutions to the problem is to create a subtype of `Future`, say – `TracingFuture` and provide overloaded method definitions for `map`, `flatMap` etc. In each of these methods a “stack” frame may be created and saved inside the `TracingFuture` which then can be accessed on failure. To get source code location needed for the “stack” frame creation we’ll use an excellent sourcecode library that uses Scala macros under the hood.

A very similar approach was proposed by Johannes Rudolph in this GitHub repository.

import scala.concurrent.{ExecutionContext, Future}
import scala.util.{Failure, Success, Try}

case class FutureTraceElement(enclosing: String, method: String, file: String, line: Int)

class TracingFuture[+T](
    underlying: Future[T],
    val trace: Vector[FutureTraceElement])
 extends Future[T] {

    def map[S](f: (T) => S)(implicit ec: ExecutionContext,
                                        enclosing: sourcecode.Enclosing,
                                        file: sourcecode.File,
                                        line: sourcecode.Line): TracingFuture[S] = {
      val mapped = underlying.map(f)
      new TracingFuture[S](
            mapped,
            trace :+ FutureTraceElement(enclosing.value, "map", file.value, line.value)
      )
    }

    def onComplete[U](f: (Try[T]) => U)(implicit ec: ExecutionContext): Unit =
     underlying.onComplete(t => f(tracedTry(t)))

    private def tracedTry[A](t: Try[A]): Try[A] = t match {
     case s: Success[A] => s
     case Failure(ex) =>
      val origSt = ex.getStackTrace.toVector
      val enrichedSt = origSt ++ trace.map { fte =>
       new StackTraceElement(fte.enclosing, fte.method, fte.file, fte.line)
      }
      ex.setStackTrace(enrichedSt.toArray)
      Failure(ex)
    }

    //... other methods elided
}

In `onComplete` an original stack trace of the exception is enriched with the “future trace” that contains all methods which took part in future chaining. `FutureTraceElement` that contains information about enclosing class name and a source location is created using `implicit` values provided by the sourcecode library.

If we replace all `Future`s with `TracingFuture`s in the previous snippet, the following trace will be printed:

Exception in thread "main" java.lang.Exception
    at snippets.TracingAsync$.$anonfun$m3$2(snippets.scala:34)
    at scala.concurrent.Future$.$anonfun$apply$1(Future.scala:653)
    ... some internal calls elided
    at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)
    at snippets.TracingAsync.m3.apply(/absolute/path/snippets.scala:34)
    at snippets.TracingAsync.m2.map(/absolute/path/snippets.scala:33)
    at snippets.TracingAsync.m1.map(/absolute/path/snippets.scala:32)

First, the original stack trace is printed, and then the “future trace” frames follow.

This solution is nice because it mostly works at compile time and thus has close to zero performance overhead. It forces a programmer to use `TracingFuture` instead of the standard `Future`, though – it may be tedious to introduce this to a large codebase.

You can find a complete example in the demo project.

Runtime bytecode instrumentation

The second solution is to perform runtime bytecode instrumentation via JVM agent API. JVM allows you to register _agents_ that modify classes in runtime. If you use the agent, you can change the behavior of the `Future` trait’s methods like `map` and `flatMap` to do the similar thing as our compile-time solution. Implementing such agent is a much more complex task than creating a rather simple `TracingFuture` class from the previous solution. The code is too long to walk through it all in this blog post. I’ll explain how to use this approach, though.

Ruslan Shevchenko implemented such agent in his GitHub repository. The project may not be 100% complete but it works just fine for most applications. I’ll use it as an example.

The project works as follows: bytecode is modified and as a result every `Future` function call that accepts `ExecutionContext` is intercepted and custom implementation is invoked instead of the original. Custom implementation obtains the current stack trace and stores it in Scala’s `DynamicVariable` bound to the thread that is going to execute the callback. The snippet below presents such intercepting function (in this case – a `flatMap`).

def rFlatMap[A, B](future: Future[A],
                   fun: A => Future[B],
                   executor: ExecutionContext): Future[B] = {
    val trace = Thread.currentThread.getStackTrace
    val prevTrace = new StackTraces(trace, ThreadTrace.prevTraces.value)
    future.flatMap { a => trackedCall(fun(a), prevTrace) }(executor)
}

To enable the agent you must first obtain the JAR file containing it. I prebuilt it, so that you don’t have to compile it yourself. It’s available here. Please note that the linked JAR file is built for Scala 2.12. To register the agent, pass `-javaagent:/path/to/the/agent/agent_2.12.jar` argument to the `java` command executing your program.

If you ran our previous example with the agent enabled, you’d get the following trace:

Exception in thread "main" java.lang.Exception
    at snippets.Async$.$anonfun$m3$1(snippets.scala:19)
    at trackedfuture.runtime.TrackedFuture$$anon$2.run(TrackedFuture.scala:28)
    ...some internal calls elided
    at java.lang.Thread.getStackTrace(Thread.java:1556)
    at trackedfuture.runtime.TrackedFuture$.apply(TrackedFuture.scala:19)
    at trackedfuture.runtime.TrackedFuture$.rapply(TrackedFuture.scala:14)
    at trackedfuture.runtime.TrackedFuture.rapply(TrackedFuture.scala)
    at snippets.Async$.m3(snippets.scala:19)
    at snippets.Async$.m2(snippets.scala:18)
    at snippets.Async$.m1(snippets.scala:17)
    at snippets.Async$.delayedEndpoint$snippets$Async$1(snippets.scala:21)
    at snippets.Async$delayedInit$body.apply(snippets.scala:11)
    at scala.Function0.apply$mcV$sp(Function0.scala:34)
    at scala.Function0.apply$mcV$sp$(Function0.scala:34)
    at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)
    at scala.App.$anonfun$main$1$adapted(App.scala:76)
    at scala.collection.immutable.List.foreach(List.scala:378)
    at scala.App.main(App.scala:76)
    at scala.App.main$(App.scala:74)
    at snippets.Async$.main(snippets.scala:11)
    at snippets.Async.main(snippets.scala)

As you can see, in this solution `Future` combinator calls are interleaved with “normal” stack trace.

The bytecode instrumentation has some advantages over the compile-time solution. You don’t have to change the code in any way. You can opt-in and opt-out just by enabling or disabling the agent. However, this solution comes with a price – it has a major performance hit. Every wrapped method call requests a whole thread stack trace from the JVM^[2] – some might find it unacceptable for the use in production. Also, more things can break since the bytecode instrumentation is a tricky business and has to be used with care.

Again, see the demo project for a complete example.

Demo project

I implemented a very simple asynchronous “calculator” to present the solutions and also to compare them. It fails for some inputs, generating a stack trace. The code is available here.

To try it out, first clone the repository:

$ git clone git@github.com:povder/tracing-future.git

Start a sbt’s interactive mode:

$ cd /path/to/repo/tracing-future
$ sbt
[info] Loading project definition from /path/to/repo/tracing-future/project
[info] Set current project to root (in build file:/path/to/repo/tracing-future/)

And run a chosen project. To run a project without any tracing, execute the following:

sbt> notracing/run

For a compile-time solution, execute:

sbt> tracing/run

For an agent-based solution, execute:

sbt> agent/run

Each execution will fail, but with differing stack traces.

Summary

We all are in the need for async to make systems we design scalable. With the introduction of such clever mechanisms as `Future`s combinators, asynchronous programming became a lot more convenient than at its beginnings (does anybody remember the callback hell?). It can get even more convenient with an easy error tracing using the solutions presented in this article – I hope that you’ll find them useful.

Notes

a. This behavior can be overridden, see this Stack Overflow answer.

References

Lindholm, Yellin, Bracha, Buckley. (2015) Java Virtual Machine Specification: Java Virtual Machine Stacks
Aleksey Shipilev. (2014) The Exceptional Performance of Lil’ Exception, (chapter on exception instantiation)