Andre Leiradella

Generalist Programmer | printf("%s@%s.com", "andre", "leiradella");

2011-10-24

Auto-vectorization for the Masses (part 5 of n) - Code Generation, Graphs and A-star

by Andre Leiradella

Back

Part 5 of the series, you may want to read the previous articles before this one:

Introduction

I’ve been writing the code generation part of the SimpleC compiler, but it has proven to be much harder than I anticipated. While writing code generation for the AddExpr, I had to edit the code multiple times to get it right. For the SPU there are five possible outcomes for the generated code:

It may seem easy to code these three rules to generate the correct outcome but there are other factors that make it hard:

And the list goes on. It’s true that some items on that list are easy to implement, but writing generating code for AddExpr bothered me enough to look for alternatives. One that looks promising is using graphs and [A](http://en.wikipedia.org/wiki/A_search_algorithm). I haven’t implemented to whole thing yet, but I want to share what I’m thinking of doing here and hopefully get some feedback.

Code Generation as Data Transformation

Consider the AddExpr class in the SimpleC source code. It has two operands of type Node and must output a Variable that is the identifier that holds the result of the addition. So the question we’re trying to answer is “how can we transform a tuple <Node, Node> into a Variable such that the resulting variable has the sum of the elements of the tuple?”

So I thought I could write a set of rules describing data transformations and let the computer figure out the best outcome each time AddExpr (and any other AST node for that matter) is compiled. The idea was born.

Data Transformation as a Graph

The first thing we need is some way to represent the set of rules describing the possible transformations of data. Constraining ourselves only to the subset necessary to compile a AddExpr node, we can build a graph like this:

AddExprGraph

The transformations’ constraints are as follows:

Transformations also have a cost associated to them. For example, going from <int> to <float> costs 7, the cost of si_csflt. Some transformations don’t cost anything and only serve to change the meaning of the data, as from <Node> to <512>.

There’s a problem with this graph though. It’s impossible to start with <Node, Node>, the two operands for an addition in the abstract syntax tree, and arrive at an addition. That node isn’t even in the graph! We could add it and, by combining transformations of <Node> to other nodes, build other nodes and edges that would allow us to arrive at <add int> or <add float>. But it would be painful to do this by hand, so instead we write some code that takes a <Node, Node> and it builds all these nodes for us, which results in the following graph:

AddExprCompleteGraph

Ugh, good thing we let the computer do the work for us. It’s worth noting one thing though: there aren’t transformations that act on both types at the same time. For example, there isn’t a transformation from <int, int> to <float, float>. This is because the same thing can be accomplished via a transformation from <int, int> to <float, int> and from it to <float, float>, and the cost would be the same anyway but this way the node count is smaller.

To shrink to graph a little, we can get rid of nodes which don’t have paths to the target nodes, <add int> and <add float>, since those are the ones we have to reach to perform the addition. Doing this (using the distance matrix from the next section) leads us to our final graph:

AddExprSimplifiedCompleteGraph

Code Generation as A*

With our graph in place all we have to do is give A* the start node <Node, Node> and one of the goal nodes, and voilà, our code is generated! Great, so we’ll have to code A*, which is easy, and the heuristic function, which is also easy… or it’s not?

A* is largely used for path finding, meaning that nodes are spatially positioned in relation to each other like in a map or maze. In those cases, implementing the heuristic is straightforward: for the former the Euclidean distance can be used, and for the later the Manhattan distance is a better fit if only axis-aligned movements are allowed. Unfortunately none of those fits in our case; our nodes cannot be spatially positioned like that and the actual cost of an edge depends on the path to the current node so far.

For A* to find the best path in the graph, the heuristic must be an admissible heuristic. In short, the heuristic from node x to node y must never overestimate the actual lowest cost to go from x to y. So we have two things to consider if we are to build an admissible heuristic for our graph:

The constraints are easy to deal with. If an edge has a constraint that is not satisfied at some point of the search, we can look at it as having an infinite cost. On the other hand, if the constraint can be satisfied the edge has its actual cost to be traversed. Since the heuristic must be admissible, we take the lower of these two values which is of course the actual edge cost.

The variable cost is not hard also. If it’s variable how can we be sure we’re not overestimating it? An easy answer for it is to consider the cost as zero, but it makes the search slower. What we need is the actual minimum cost for the edge, and we know it for all edges. For example, going from <Node> to <int> means a constant must be built for the integer value of Node. The minimum cost for building a constant is 2 (using si_il or another similar constant-formation intrinsic.) Some integers will require two instructions or one odd-pipe instruction with a cost of 4 but since we’re looking for the minimum cost we can take that as 2 for this case. So we can have the minimum cost of an edge as one of its attributes and use it for the heuristic if the actual edge cost is variable.

Another thing to consider is how we can evaluate the heuristic. It must not overestimate the actual cost, and the actual cost is the cost of the cheapest path so it seems we have to find the cheapest path between two nodes to get the heuristic right, which wouldn’t work since we’d be using A* to evaluate the heuristic for the A*. If that was the case we’d be better off using a breadth-first search.

To be able to evaluate the heuristic we can use the Dijkstra’s algorithm for all nodes and build a distance matrix. Here’s one column of the matrix showing the minimum cost to get from all nodes to the two target nodes:

Source Node Distance to <add int> Distance to <add float>
<Node, ConstantExpr> 4 8
<add float> - 0
<MulExpr, float> - 6
<int, int> 2 9
<int, s10> 2 9
<int, float> - 13
<MulExpr, Node> - 8
<ConstantExpr, float> - 8
<s10, Node> 4 11
<Node, Node> 4 8
<add int> 0 7
<float, ConstantExpr> - 8
<Node, 512> 4 11
<Node, int> 2 9
<int, ConstantExpr> 2 9
<ConstantExpr, ConstantExpr> 4 10
<int, 512> 2 9
<int, Node> 2 9
<float, float> - 6
<ConstantExpr, 512> 4 11
<512, Node> 4 11
<float, Node> - 6
<MulExpr, int> - 13
<float, MulExpr> - 6
<Node, float> - 6
<int, MulExpr> - 13
<512, int> 2 9
<ConstantExpr, s10> 4 11
<Node, MulExpr> - 8
<512, ConstantExpr> 4 11
<ConstantExpr, int> 2 9
<s10, int> 2 9
<float, int> - 13
<s10, ConstantExpr> 4 11
<ConstantExpr, MulExpr> - 8
<ConstantExpr, Node> 4 8
<Node, s10> 4 11
<MulExpr, ConstantExpr> - 8

It’s interesting to note that not all source nodes arrive at <add int>, but all of them arrive at <add float>. This is because we have edges from <int> to <float> and from <add int> to <add float> so if it’s possible to arrive at <add int> it’s also possible to arrive at <add float>.

Well, with our graph and search algorithm in place we can finally start generating code. Well, almost. There’s still one problem: how do we dynamically evaluate edge constraints and variable costs? We’re lucky because there’s already a library that does exactly this, evaluates Java source code contained in strings: BeanShell.

Evaluating Constraints and Costs

I didn’t code this part yet, but I’ve used BeanShell before and it delivers on its promises. I can’t see anything that would prevent us from evaluating constraints and costs which is the last barrier to have actual generated code. I’ll keep working on this line and hopefully have something to write about in a month or so (no promises though!)

Conclusion

This has been a lot of work, but I strongly believe it’ll prove to be a huge time saver considering the amount of work I had with AddExpr. But there are many more benefits:

I’m also excited with the possibilities:

Thanks for reading so far. Please use the comments section below and give me some feedback. Tell me what’s right and what’s wrong with this approach to code generation. Tell me what more it can accomplish or point me to a fundamental flaw in my train of thought that will kill it. Either way, thank you!

See ya!

Back