Scala path-dependent types – A real world example

In this post, ShiftForward’s Rui Gonçalves presents a Scala feature that has recently helped us through a design choice in our AdForecaster system: more concretely, a feature of its type system.

A type system is, indeed, a very powerful tool for a programming language. In fact, it is a concrete case of a formal method; it provides us a near mathematical proof that programs that compile successfully obey a set of rules and restrictions and, as such, are bug-free in that respect. For example, after compiling a Java program, we know for sure that the it will not do a division of a number with a string and that, if our method foo receives an integer as argument, it will not be called anywhere with an argument of any type other than an integer.

This blog post focuses on how a programming language can be used to model more advanced code restrictions and check them at compile time. By checking them at compile time, such languages enable the detection of bugs which otherwise would only be caught at runtime, when systems are already deployed (or by manually writing tests – though tests can be wrongly written and may not cover all potential failure cases). We assume basic knowledge of Java programming, even though we feel it may not be necessary to delve into the code to understand the main problem and concepts presented.

Consider the following scenario:

  • Our forecast system, at a particular stage, simulates the ad server decision mechanism in order to predict the number of hits for each campaign;
  • The delivery algorithm is modelled as a component with two operations:
    • Pre-process. It must be able to pre-process a list of campaigns, yielding a structure that allows faster evaluation of delivery rules. For example, it can sort the campaigns according to their priority or index campaigns according to a targeting rule;
    • Execute. It must be able to quickly determine which campaign will be delivered for a given request, using the aforementioned structure. This decision step is done a huge number of times – once per simulated request;
  • Different algorithms can yield completely different structures in the pre-processing phase;
  • The algorithm must be stateless, as a single instance is shared by many concurrent threads and is used to execute many forecasts.

We want our system to be decoupled from the delivery algorithm it is using, in order for us to switch the delivery algorithm according to our clients’ needs and to easily introduce new features, such as pacing and tandem. Moreover, the specific algorithm to use may not yet be known at compile time, e.g. its name can be indicated as a string via API.

Let’s think how we could do this using Java. The most natural encoding we could do would be to make our Algorithm a generic class, like this:

public interface Algorithm<CampaignContainer> {
    CampaignContainer preprocess(List<Campaign> campaigns);
    Campaign run(CampaignContainer campaigns, Request req); // returns null if no booking was made
}

 

Following this approach, our main Forecast class would look like this:

public class Forecast {
    <CampaignContainer> void forecast(List<Campaign> campaigns, List<Request> requests, Algorithm<CampaignContainer> algorithm) {
        CampaignContainer preprocessedCampaigns = algorithm.preprocess(campaigns);
        List<Campaign> results = new ArrayList<>();
        for(Request req: requests) {
            results.add(algorithm.run(preprocessedCampaigns, req));
        }
        // (...)
    }
}

 

Note that the forecast method was also made generic, in order to accept heterogeneous Algorithm classes.

As we said before, the specific algorithm to use is not known at compile time. To instantiate an algorithm given a class name – using a mechanism known as reflection – we can do:

<CampaignContainer> Algorithm<CampaignContainer> getAlgorithm(String className) throws Exception {
    return (Algorithm< CampaignContainer>) Class.forName(className).newInstance();
}

 

This method will cause a compiler warning, indicating that an unsafe cast is being done. That happens because the compiler erases generic types; e.g., a program can’t distinguish between an Algorithm<String> and an Algorithm<Integer>.

However, there is a bigger problem: we don’t know which specific type is CampaignContainer when we call forecast! When we call the getAlgorithm method, we should know in advance what is the type of CampaignContainer and, as it can be anything, this solution is not suitable to support arbitrary structures as an output of the pre-processing step.

In Java, the only way we could do this is to forget about types and generics at all and make our algorithms return and receive objects of type Object (which is a superclass of every other class) instead of CampaignContainer:

interface Algorithm {
    Object preprocess(List<Campaign> campaigns);
    Campaign run(Object campaigns, Request req); // returns null if no booking was made
}

 

This is obviously a bad and unsafe design choice, as the run method is now able to take any object as parameter and will yield runtime errors if some method calls it in the wrong way. This also implies that, inside run, algorithms must always cast the campaigns argument to its concrete type, which is a seemingly unnecessary operation and probably expensive overall, considering that run will be called a huge number of times. (Relinquishing type-safety)

When comparing programming languages, we should evaluate for ourselves the flexibility they give us to correct and easily encode our control flows and restrictions. Let’s think for a minute here: must the Forecast class really have to know what is the campaign container? Given a specific instance of Algorithm (without any more information), the only thing we want to enforce is that the first input of the run method is of the same type as the return value of a preprocess call. Besides that, it doesn’t really matter what is the particular type of the container.

In Scala, this problem can be properly encoded using a feature called path-dependent types. Let’s see how the equivalent code in that language looks like:

trait Algorithm {
    type CampaignContainer // an abstract type field; it must be defined by subclasses
    def preprocess(campaigns: Seq[Campaign]): CampaignContainer
    def run(campaigns: CampaignContainer, req: Request): Campaign
}
class Forecast {
    def forecast(campaigns: Seq[Campaign], requests: Seq[Request], algorithm: Algorithm) {
        val preprocessedCampaigns: algorithm.CampaignContainer = algorithm.preprocess(campaigns)
        val results = requests.map { req => algorithm.run(preprocessedCampaigns, req) }
        // (...)
    }
}

 

Scala has the concept of abstract types: types that, just like abstract methods, must be defined by subclasses. Our CampaignContainer here is declared as an abstract type in the Algorithm class, and it is the concrete subclass of the latter that defines what a CampaignContainer is. Therefore, the abstract type CampaignContainer is dependent of the instance it refers to.

Let’s quickly see a concrete algorithm that returns a Map mapping string keys to campaigns in the pre-processing step:

class SimpleAlgorithm extends Algorithm {
    type CampaignContainer = Map[String, Campaign]
    def preprocess(campaigns: Seq[Campaign]): Map[String, Campaign] = (...) // code for creating the Map here
    def run(campaigns: Map[String, Campaign], req: Request): Option[Campaign] = (...) // code for choosing the campaign here
}

 

The best part is that our Forecast class – a class from the outside of this type hierarchy – can refer to this abstract type and use it in their variables. In the forecast method, our preprocessedCampaigns is an instance of algorithm.CampaignContainer; it is called a path-dependent type. This models the exact relation we wanted for our entities: independently of the concrete type of our Algorithm, we assure that the run method (that, in forecast, takes an algorithm.CampaignContainer as argument) can only take structures returned by a call to preprocess on the algorithm instance. It is important to note that all of these restrictions are enforced at compile time and, as such, do not affect the runtime performance of our system.

More advanced type systems, such as the one featured in Scala, increase our confidence in the systems we develop and help us reduce the number of bugs before deployment – which ultimately benefits, without a doubt, our clients. It is important to note that these important validations doesn’t require extra effort from developers – no need to manually write tests or to run third-party tools – and have no runtime performance impact, which is always a key concern when developing online advertisement systems.

The complete source code of this example can be seen here: AlgorithmJava and AlgorithmScala. If you are curious and have some knowledge in Java, I recommend you to compare the syntax of both languages and to read further about Scala here.

Tags: , ,