Name	Name	Last commit message	Last commit date
parent directory ..
docs	docs
src	src
README.md	README.md
pom.xml	pom.xml

Chapter 3: Variable Declarations

In this chapter we extend the language grammar to allow variable declarations. This allows us to write:

int a=1;
int b=2;
int c=0;
{
    int b=3;
    c=a+b;
}
return c;

Here is the complete language grammar for this chapter.

You can also read this chapter in a linear Git revision history on the linear branch and compare it to the previous chapter.

Extensions to Intermediate Representation

To recap, our list of Nodes are:

Node Name	Type	Description	Inputs	Value
Start	Control	Start of function	None	None for now as we do not have function arguments yet
Return	Control	End of function	Predecessor control node, Data node value	Return value of the function
Constant	Data	Constants such as integer literals	None, however Start node is set as input to enable graph walking	Value of the constant
Add	Data	Add two values	Two data nodes, values are added, order not important	Result of the add operation
Sub	Data	Subtract a value from another	Two data nodes, values are subtracted, order matters	Result of the subtract
Mul	Data	Multiply two values	Two data nodes, values are multiplied, order not important	Result of the multiply
Div	Data	Divide a value by another	Two data nodes, values are divided, order matters	Result of the division
Minus	Data	Negate a value	One data node, value is negated	Result of the unary minus

In this chapter we introduce:

Node Name	Type	Description	Inputs	Value
Scope	Symbol Table	Represents scopes in the graph	All nodes that define variables	None

Symbol Tables

To support variable declarations and lexical scopes, we introduce symbol tables in the parser. A symbol table maps names to nodes.

We maintain a stack of symbol tables. When a new lexical scope is created we push a new symbol table to the stack; when scope exits the topmost symbol table is popped from the stack.

Declaring a name adds it to the current symbol table. If a name is assigned to, then its mapping in the most recent symbol table is updated. If a name is accessed, its mapping is looked up in the symbol tables. The lookup goes up the stack of symbol tables, in lexical order.

ScopeNode

A ScopeNode encapsulates the stack of symbol tables that are used to implement lexical scopes.

The nodes referenced by the names in a lexical scope become inputs to the ScopeNode; this makes ScopeNode a user of these values, thus keeping them alive while a particular lexical scope is alive. When the lexical scope ends, those inputs are removed. This is crucial because peepholes will remove dead Nodes and every new Node is effectively dead at the instance of creation without such a reference.
In Chapter 5 when we implement branches, the ScopeNode gets duplicated at a branch point and then gets merged when the branch joins. This allows the names to evolve in each branch independently. At the merge point names that have diverged require a Phi node.
In Chapter 7 additional logic is deployed for handling Phis from back edges.
The ScopeNode is used to track some other values such as the Control token starting from Chapter 4, and Aliases starting from Chapter 10. Special name bindings are used for these, and each name is tracked across branches and then merged at merge points.
In Chapter 10 the ScopeNode is enhanced to also maintain the declared type for each name.

From an implementation standpoint, each symbol table is represented as a HashMap<String,Integer>, where the String is the variable name, and the Integer is an index into the inputs of the ScopeNode. To access the Node associated with a name, first the symbol table is consulted, and then the ScopeNode's inputs to find the required Node. The stack of symbol tables is represented as Stack<HashMap<String,Integer>>.

When a lexical scope is exited, we not only pop the symbol table representing the scope from the stack, but also remove the relevant input nodes. If the variable's expression only exists for the Scope mapping (e.g., "just in case" the Parser parses code that uses the variable) this is when the expression dies:

  int a=1;
  {
      int b=a*a*a;
  }  // When this scope closes & ends, b (and the cube of a) dies
  return a;

In the visualization, we show the ScopeNode in a separate box running underneath the main IR graph; the stack of symbol tables is shown inside the box. We show the edge from each variable in the symbol table to the defining Node in the graph.

The ScopeNode is neither a Data Node nor a Control Node; it really is a utility for maintaining symbol tables, name lookups, plays a crucial role in deciding where Phi nodes are required. Representing it as a Node allows us to leverage the Sea of Nodes def-use architecture to maintain liveness of values in scope.

A ScopeNode has no outputs.

Demonstration

We show the output from following code.

01      int a=1;
02      int b=2;
03      int c=0;
04      {
05          int b=3;
07          c=a+b;
08      }
09      return c;

When we start parsing above, we start with a symbol table (at level 0). The variables a, b, and c are registered in this table.
On line 4, we start a nested scope. At this point, a new symbol table is pushed (at level 1).
On line 5, a variable b is declared. b is registered in the symbol table at level 1. This then hides the variable with the same name on line 2.
On line 7, we update the variable c. Variable lookup will find it in the symbol table at level 0. The node graph is shown below (note that peephole opts are switched off here so that we can see the graph as its initially constructed).

The diagram shows the ScopeNode and the symbol tables in the bottom cluster. Each table is annotated by its scope level, and shows the variables defined in that scope.

When we exit the nested scope on line 8, the symbol table at level 1 is popped.
At line 9, our graph looks like this:

As we see, only 1 symbol table is present at this point.

We can also have multiple scopes at the same level during the execution of the program, but only one will be active at a time because the symbol table can only hold one symbol table for each scope level.

For example:

01   int a=1; 
02   int b=2; 
03   int c=0; 
04   { 
05       int b=5; 
06       c=a+b; 
08   } 
09   {
10       int e=6; 
11       c = a+e;  
13   } 
14    return c;

Here both scopes starting from (04-08) and (09-13) are at the same level:

public final Stack<HashMap<String, Integer>> _scopes;

The _scopes stack holds nested scopes, with index 0 representing the highest (global) scope and deeper indices representing more nested scopes. Popping returns the current scope and moves to the outer one. There is a direct one-to-one mapping between each index and its corresponding scope.

The first nested scope(04-08)

The second nested scope(09-13)

In both cases the index which represents the deepness of the scope is 1.

Enabling peephole optimizations

Once we enable peephole optimizations, the parser is able to reduce all the expressions down to a constant. This is illustrated nicely below.

01      int x0=1;
02      int y0=2;
03      int x1=3;
04      int y1=4;
05      return (x0-x1)*(x0-x1) + (y0-y1)*(y0-y1);

The graph for above, is displayed below.

The return node has the constant 8 as its data value! When the parsing ends and the scope at level 0 is removed, only the return 8 remains.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chapter03

chapter03

README.md

Chapter 3: Variable Declarations

Extensions to Intermediate Representation

Symbol Tables

ScopeNode

Demonstration

Enabling peephole optimizations

Files

chapter03

Directory actions

More options

Directory actions

More options

Latest commit

History

chapter03

Folders and files

parent directory

README.md

Chapter 3: Variable Declarations

Extensions to Intermediate Representation

Symbol Tables

ScopeNode

Demonstration

Enabling peephole optimizations