Warning: mkdir() [
function.mkdir]: Permission denied in
/home/webs/affiliatelib2/CacheManager.php on line
12
Warning: mkdir() [
function.mkdir]: No such file or directory in
/home/webs/affiliatelib2/CacheManager.php on line
12
Warning: fopen(/home/templatecore2cache//*cluesnet.com/12/120ba7d8ba8ab936a86ce4d97add1f69a8effea1.tc2cache) [
function.fopen]: failed to open stream: No such file or directory in
/home/webs/affiliatelib2/CacheManager.php on line
130
Warning: fwrite(): supplied argument is not a valid stream resource in
/home/webs/affiliatelib2/CacheManager.php on line
131
Warning: fclose(): supplied argument is not a valid stream resource in
/home/webs/affiliatelib2/CacheManager.php on line
132
In compiler optimization,
register allocation is the process of
multiplexing a large number of target program variables onto a small number of
Central processing unit processor register. The goal is to keep as many operands as possible in registers to maximise the execution speed of software programs. Register allocation can happen over a
basic block (
local register allocation), over a whole function/procedure (
global register allocation), or in-between functions as a calling convention (
interprocedural register allocation).
Most
computer programs need to process large numbers of different data items. However, most CPUs can only perform operations on a small fixed number of "slots" called registers. Even on machines that support memory operands, register access is considerably faster than memory access. Variables not allocated to registers must be loaded in and out of
random access memory whenever they are used.
Register spilling occurs where there are more live variables than the machine has registers. When a compiler is generating machine code and there are more live
variables than the machine has
Processor register, it has to transfer or "spill" some variables from
Processor register to
computer storage. This incurrs a certain cost, as access from memory is typically slower than access from a register.
In compilers,
register pressure occurs when there are more variables to allocate than there are register (computer) available. This typically results in register spilling.
Challenges
Register allocation is an
NP-complete problemGregory J. Chaitin, Mark A. Auslander, Ashok K. Chandra, John Cocke, Martin E. Hopkins, and Peter W. Markstein. "Register allocation via coloring." Computer Languages, 6:47-57, 1981.Fernando Magno Quintão Pereira, Jens Palsberg, "Register Allocation after Classical SSAElimination is NP-complete", http://www.cs.ucla.edu/~palsberg/paper/fossacs06.pdf. The number of variables in a typical program is much larger than the number of available registers in a processor, so the contents of some variables have to be
Register spilling (saved) into memory locations. The cost of such spilling is minimised by spilling the least frequently used variables first, but it is not easy to know which variables will be used the least. In addition to this the hardware and operating system may impose restrictions on the usage of some registers.
Global register allocation
Like most other
compiler optimizations, register allocation is based on the result of some compiler analysis, mostly the result of live variable analysis from data flow analysis.
Traditional allocators perform
global register allocation using a
graph coloring algorithm devised by
Chaitin et al. It can be divided into two phases:
Machine instructions are generated as if there are an infinite number of symbolic registers. So all variables suitable to being in registers will be assigned to numbered logical registers. The phase is sometimes called register variable recognition.
symbolic registers are replaced by physical registers in a target machine, with the minimum cost of register spilling.
In phase two, an
interference graph is constructed where nodes are program variables and an arc connects two nodes if they are alive at the same time. More precisely, if one variable is alive at the time the other is defined then they are said to interfere. If the graph can be colored with
R colors then the variables can be stored in
R registers. This insight was pointed out by John Cocke, "father of the RISC architecture". The problem is that coloring a graph is an NP-hard problem.
The key insight to Chaitin’s algorithm is called the
degree < R rule which is as follows. Given a graph
G which contains a node
N with degree less than
R,
G is
R-colorable
If and only if the graph
G’, where
G’ is
G with node
N removed, is
R-colorable. The proof is obvious in one direction: if a graph
G can be colored with
R colors then the graph
G’ can be created without changing the coloring. In the other direction, suppose we have an
R-coloring of
G’. Since
N has a degree of less than
R there must be at least one color that is not in use for a node adjacent to
N. We can color
N with this color.
While G cannot be R-colored
While graph G has a node N with degree less than R
Remove N and its associated edges from G and push N on a stack S
End While
If the entire graph has been removed then the graph is R-colorable
While stack S contains a node N
Add N to graph G and assign it a color from the R colors
End While
Else graph G cannot be colored with R colors
Simplify the graph G by choosing an object to spill and remove its node N from G
(spill nodes are chosen based on object’s number of definitions and references)
End While
This algorithm is O(n^2). This algorithm can be improved through subsumption which is the act of coalescing nodes which are the source and target of copy operations into a single node before running the algorithm. This reduces the number of nodes to color but can increase the degree of any coalesced node. This can only be done when the nodes do not interfere with each other, however, and aggressive coalescing can lead to uncolorable graphs. (Preston Briggs’ thesis work introduces safer methods to determine which nodes to coalesce and spill. Based on his improvements this algorithm is often called the Chaitin-Briggs algorithm.) The subsumption step is slow and is not done in fast register allocators.
Recent developments
Graph coloring allocators produce efficient code, but their allocation time is high. In cases of static compilation, allocation time is not a significant concern. In cases of dynamic compilation, such as
Just-in-time compilation (JIT) compilers, fast register allocation is important. An efficient technique proposed by Poletto and Sarkar is linear scan allocation. This technique requires only a single pass over the list of variable live ranges. Ranges with short lifetimes are assigned to registers, whereas those with long lifetimes tend to be
Register spilling, or reside in memory. The results are on average only 12% less efficient than graph coloring allocators.
The linear scan algorithm follows:
Perform dataflow analysis to gather liveness information. Keep track of all variables’ live intervals, the interval when a variable is live, in a list sorted in order of increasing start point (note that this ordering is free if the list is built when computing liveness.) We consider variables and their intervals to be interchangeable in this algorithm.
Iterate through liveness start points and allocate a register from the available register pool to each live variable.
At each step maintain a list of active intervals sorted by the end point of the live intervals. (Note that insertion sort into a balanced binary tree can be used to maintain this list at linear cost.) Remove any expired intervals from the active list and free the expired interval’s register to the available register pool.
In the case where the active list is size R we cannot allocate a register. In this case add the current interval to the active pool without allocating a register. Spill the interval from the active list with the furthest end point. Assign the register from the spilled interval to the current interval or, if the current interval is the one spilled, do not change register assignments.
Cooper and Dasgupta recently developed a "lossy" Chaitin-Briggs graph coloring algorithm suitable for use in a JIT Cooper, Dasgupta, "Tailoring Graph-coloring Register Allocation For Runtime Compilation", http://llvm.org/pubs/2006-04-04-CGO-GraphColoring.html. The "lossy" moniker refers to the imprecision the algorithm introduces into the interference graph. This optimization reduces the costly graph building step of Chaitin-Briggs making it suitable for runtime compilation. Experiments indicate that this lossy register allocator outperforms linear scan on the majority of tests used.
"Optimal" register allocation algorithms based on Integer Programming have been developed by Goodwin and Wilken for regular architectures. These algorithms have been extended to irregular architectures by Kong and Wilken.
While the worst case execution time is exponential, the experimental results show that the actual time is typically of order O(n^{2.5}) of the number of constraintsKong, Wilken, "Precise Register Allocation for Irregular Architectures", http://www.ece.ucdavis.edu/cerl/cerl_arch/irreg.pdf.
References