Karim Ali | SCALACG - a collection of low-cost call graph analyses for Scala

As Scala gains popularity, there is growing interest in programming tools for it. Such tools often require call graphs. Applying existing call graph construction algorithms to the JVM bytecodes generated by the Scala compiler produces imprecise results due to type information being lost during compilation. Therefore, we propose adapting existing call graph construction algorithms, Name-Based Resolution (RA) and Rapid Type Analysis (RTA), for Scala.

We have implemented 5 algorithms: RA, TCA^names, TCA^bounds, TCA^expand, and TCA^expand-this. The algorithms are implemented as a Scala compiler plugin. Our experimental evaluation shows that handling of complex Scala constructs, e.g., traits and abstract type members, greatly improves the precision of the constructed call graphs. For more details on the algorithms, and a formal proof of soundness, please refer to our ECOOP’14 paper.

Downloads

scalabench.tar.gz: a self contained tarball that include all the necessary scripts, runnable JARs, required to replicate our experiments.
scalacg.tar.gz: the source code of our Scala compiler plugin. This tarball contains the source code of the implementation of all our algorithms, and RTA^wala. Additionally, it comes with two test suites: one for TCA-based algorithms, and another one for RA. The source code can be imported as an Eclipse Scala project (Note: you need the Scala-IDE plugin for Eclipse to properly identify this Scala project).
callgraph-plugin.jar: the Scala compiler plugin that contains the implementation of our algorithms. Additionally, it contains the ProBe tool that we use to compare and output call graphs.
walacg.jar: a runnable JAR for the implementation of RTA^wala.
scalavm.ova: a VirtualBox appliance that is set up to replicate our experiments. It has scalabench.tar.gz available on the desktop. You do not need to worry about any prerequisites if you are using this virtual machine. Administrator username: aec, password: ecoop14.

Prerequisites

These are the prerequisites for using our plugin with the Scala compiler.

Scala-2.10.2, the plugin will crash if another Scala version is used.
Set the environment variable $SCALA_HOME (%SCALA_HOME% on Windows) to the location where you extract the Scala-2.10.2 tarball.
Set the environment variable $PATH to $PATH:$SCALA_HOME/bin (%PATH%;%SCALA_HOME%\bin on Windows).
Set the environment variable $JAVA_OPTS (%JAVA_OPTS% in Windows) to -Xmx2g -XX:PermSize=512m -XX:MaxPermSize=512m. This will give a maximum of 2G of RAM for the plugin to run. This should be enough for small programs, test cases. Increase the value of -Xmx if Java runs out of memory. It is worth noting here that the provided virtual machine requires 16G of RAM, and has the Java option -Xmx set to 12G.

Additional prerequisites should be satisfied if you would like to use scalabench.tar.gz to replicate our experiments. You do not have to worry about these if you are using the scalavm.ova virtual machine provided above.

Bash: the scripts use the Bash Unix shell to execute.
Ant: please refer to the official installation instructions to install ant on your machine.
Set the environment variable $ANT_OPTS to $JAVA_OPTS. If Java runs out of memory, adjust the environment variable $JAVA_OPTS accordingly.
walacg.jar includes a file named wala.properties that provides WALA with the location of the Java runtime directory. It’s set by default to /usr/lib/jvm/default-java/jre/lib. If the Java runtime directory on your machine is different from this, then you need to update this file with the correct path. You can either, edit the file inside the walacg.jar archive directly, or create an empty file with the name wala.properties at the same directory of walacg.jar. Then, set the property java_runtime_dir in it to the Java runtime directory on your system. You then need to run the following command to update the properties file inside walacg.jar:

$ jar uf walacg.jar wala.properties

LaTeX: this is only required if you would like to get a PDF output of our experimental results. The scripts used to replicate our experiments generate the output data in CSV format as well (tab separated). If you use LaTeX, please make sure you have the following LaTeX packages installed: authblk, fullpage, and multirow.

Usage

You can use our call-graph plugin to compile Scala code as follows, where <analysis> can be any value of: ra, tca-names, tca-bounds, tca-expand, or tca-expand-this, and defaults to tca-expand-this.

$ scalac -cp callgraph-plugin.jar -Xplugin:callgraph-plugin.jar [-P:callgraph:<analysis>] <code.scala>

The plugin will output the call graph in a file named callgraph.txt.gzip, along with a summary call graph (i.e., ignoring the edges within the library dependencies) in a file with the name callgraph-summary.txt.gzip. Both files can be viewed and queried by the ProBe tool included in callgraph-plugin.jar. To print out the edges of the call graph, run the following command:

$ java -cp callgraph-plugin.jar probe.CallGraphInfo callgraph.txt.gzip

For more information on how to use ProBe, please refer to this page.

Replicating our experiments

You can either download the virtual machine, or download the tarball scalabench.tar.gz (both in the downloads section above) and then satisfy all the prerequisites. The following tutorial does not depend on which path you decide to take.

You can run all analyses for one of our benchmark programs (see below) by executing the following commands:

$ cd <scalabench.tar.gz_extracted_location>
$ ./run <benchmark_name>

This will run each of our algorithms, RTA^wala, and the regular Scala compiler on the given benchmark. The output of the plugin will be organized in sub-folders of scalabench, following the pattern: scalabench/dist/<analysis_name>/<benchmark_name>

If you would like to run all analyses for all benchmark programs at once, you can use the scripts provided in scalabench.tar.gz as follows:

$ cd <scalabench.tar.gz_extracted_location>
$ ./run-all

If the script crashes due to memory-related issues, please set the environment variable $JAVA_OPTS with a higher value for the -Xmx option. If everything goes well, this process can take ~ 4.5 hours. So it might be a good idea to grab a cup of coffee, read a book, or attend to other items on your agenda while the scripts finish running. The output of the script is organized in a similar fashion to analyzing just one benchmark. Next, the script checks if the generated call graphs satisfy the following sanity check: TCA^expand-this ⊆ TCA^expand ⊆ TCA^bounds ⊆ TCA^names ⊆ RA

Finally, the script generates the data we present in our ECOOP’14 paper in both CSV format (under scalabench/csv) and LaTeX format (under scalabench/tex). If you have LaTeX installed, run the following script and the resulting PDF will be at scalabench/Results.pdf

$ cd <scalabench.tar.gz_extracted_location>
$ ./makelatex

Benchmark programs

Program	Description	Source
argot	a command-line parser for Scala	http://github.com/bmc/argot
ensime	a Scala Interaction Mode for Emacs	http://github.com/aemoncannon/ensime
fimpp	an interpreter for FIM++	http://github.com/KarolS/fimpp
kiama	a Scala library for language processing	http://code.google.com/p/kiama
phantm	a static analyzer for PHP applications	http://github.com/colder/phantm
scalaxb	an XML data-binding tool for Scala	http://github.com/eed3si9n/scalaxb
scalisp	a LISP interpreter written in Scala	http://github.com/Mononofu/Scalisp
see	a Scala arithmetic expression engine	http://scee.sourceforge.net
squeryl	a Scala ORM and DSL for SQL databases	http://github.com/max-l/Squeryl
tictactoe	the classic “tic-tac-toe” game	http://github.com/nickknw/arbitrarily-sized-tic-tac-toe

ECOOP’14 Validated Artifact

All of the experiments that are discussed in the tutorial above is part of the artifact we have submitted to ECOOP’14 in Uppsala, Sweden. scalacg has been verified by the Artifact Evaluation Committee to be consistent, complete, well-documented, and easy to reuse. scalacg also went through to win the Distinguished Artifact Award later on.