Iterative Refinement

Problem

Many computations consist of a sequence of high level steps which repeat until some exit condition is met. Within each step, a number of mostly independent computations occur. How do you exploit the concurrency implied by this computational structure?

Context

A large class of computations consists of an abstract objective that is initialized and then refined through a collection of steps until a termination condition is met. The computations inside a step break down into many concurrent tasks. These tasks are either independent or they can be made independent.

A simple way of carrying out non-linear optimization computations follows: A search space is sampled to generate a large number of points, each point is tested against some objective function, and the best point identified. This continues iteration after iteration until a sufficiently optimal result is found.

Forces

There is often a tradeoff between the number of iterations and how much work is carried out for each iteration.

At each step of the iteration, there may be advantages to more fine grained computation, but this may carry a higher synchronization cost at the iteration barrier as opposed to a smaller number of coarse grained processes.

A related situation arises when nested iteration spaces are generated to create enough tasks to keep the computational resources fully occupied. But this increases the complexity of the software and tends to increase overhead. Typically this would be synchronization overhead, but in a language such as Cilk this appears as thread management overhead.

Solution

The solution to the iterative refinement pattern contains four parts:

  1. Initialize the computation
  2. The sequence of computational steps
  3. The collection of tasks that execute inside a step
  4. An exit condition

The first part sets up the state of the computation including any global data structures required to manage the tasks within each step.

The second part, the “sequence of computational steps”, implies an ordered set over an iteration space. This inherently serial code sets up the iteration space, keeps track of which iteration is in progress and in most cases, includes the logic to detect the exit condition.

The third step, “a collection of tasks to execute inside a step”, is where the exploitation of concurrency occurs. As discussed in the “task parallelism” pattern, the key is to utilize some technique to separate dependencies from the collection of tasks so they can execute as independent tasks. The dependencies are then resolved as a distinct closing phase at the end of a step.

For example, in the simple case of an optimization problem, all that is needed to manage the separable dependency is to replicate the data structure comprising the state of the computation on each unit of execution. It locally updates the data structure as each task completes. Then as a separate closing step, a reduction of the data structure across the Units of execution is carried out.

In more complex examples where some direct communication is involved, the sequence of steps is split into two smaller sequences. In the first sub-sequence, the send and receive functions are invoked but none of the new values generated in the communication are used. At the close of this first substep, the communication events complete and in the next substep, the values are used. This is the basic idea behind the Bulk Synchronous Processing Model [Valliant90].

Finally a well defined exit condition is required. When the number of iterations is fixed in advance, this is accomplished by keeping track of an iteration counter. For more general problems, this may require tracking and then keeping an optimal value from a set or even refining global data structures representing the solution to a problem.

Invariant

Pre-condition: Must define the initial state of a well posed problem

Invariant:

  • All tasks inside an iteration have completed and synchronized on exit.
  • The result of the computation does not depend on the order of the execution of the tasks inside a step.

Post-condition: iteration exit condition is met

Examples

Classification algorithms generate classified subsets for a larger set of objects. One approach uses a support vector machine with SMO (sequential minimal optimization). The basic algorithm consists of the following steps:

  1. Compute a distance between points and a frontier where distance is defined according to a metric particular to the problem domain.
  2. Search the set of distances to find the maximal outliers.
  3. Adjust the frontier or boundary between the classes.
  4. Exit condition … when all points are within an error tolerance

The step in the iterative refinement pattern corresponds to each update of the frontier. An initial frontier is selected, the distances computed over all points, and a new frontier is generated. This continues until the exit condition is satisfied.

Known Uses

This is a very common pattern. The SMO algorithm is commonly used to implement Support Vector Machines. This pattern is very common in drug discovery when the direct products of ligand and drug target libraries are studied. More formally, the BSP or Bulk Synchronous Processing, is a software system designed specifically to support this pattern.

Related Patterns

This pattern is closely related to the separable dependencies pattern described in the PLPP task parallelism pattern

Authors

Kurt Keutzer and Tim Mattson

References

SMO … I still need a reference.

Lesilie Valiant, A bridging model for parallel computation, Communications of the ACM, vol. 33, pages 103-111, 1990.