PM Programming Language

PM (Parallel Models) is a new parallel programming language developed to implement numerical models on distributed systems.

The PM language has been designed to enable a programmer to concentrate on writing a model, rather than on making that model run on a particular system. A PM program will naturally scale from laptop to large cluster, taking full advantage of the hardware on each platform. At the same time it is possible to take direct control of distributed hardware when necessary. The aim is to allow a seamless transition from experimental code developed at the start of a research project all the way through to routine large runs on a supercomputer without extensive rewriting or restructuring.

PM uses a new approach – communicating operations – which combines the ability of Partitioned Global Address Space (PGAS) languages to access distributed data structures using simple subscripts with a straightforward approach to synchronisation designed to avoid race conditions without posing any undue burden on the programmer.

The PM language specification is released under the Creative Commons 4.0 Attribution Licence.

An pre-release implemetation of PM 0.4 is available now, released under the MIT licence. This consists of a PM to Fortran+MPI compiler, together with an PM interpreter aimed at debugging. Current plans include a PM to Fortran+MPI+OpenMP compiler and the addition of accelerator support (initially via with OpenMP or OpenACC). A C/C++ backend may be added to the compiler at a future date.

Overview

Overall look and feel

Parallel Execution

Distributed Execution

Stencils

Type system

More advanced synchronisation

Overall look and feel

PM adopts a standard C-like syntax. Statement blocks are enclosed in curly braces. Semicolons are optional if they would coincide with a new line. Comments are delineated with // for a single line or /* ... */ for a block of lines (which can be nested). There is a full modular structure, but this is lightweight in terms of the additional syntax incurred. Hello World in PM is simply:

print("Hello World")

A PM program consists of a set of procedure and/or type definitions followed by statements.

proc say_hello(name) {

print("Hello "++name)

}

say_hello("programmer")

Two particular features of PM syntax are where clauses and attributes. Where clauses can appear after most expressions and enable sub-expression to be factored out:

s=a * exp(b/a) where a=c/sqrt(b)

Attributes provide additional information to certain types of statement:

for i in A <<distr=BLOCK_CYCLIC>> {

update(&i)

}

Parallel Execution

The fundamental unit of PM parallelism is the strand. A strand is a very lightweight entity, that can map to a single execution of a loop body and does not need a separate stack. Strands can communicate with other strands (which may potentially be running on a different node in a distributed system) using communicating operations. Communicating operations also serve as a soft synchronisation points between strands.

The for statement executes its body of statements concurrently (and possibly in parallel) for every element of an array or set of index values. A separate strand is created for each index value and these strands are bundled together into a new parallel context.

for cell in model_array {

foreach invar step in start_time..end_time {

cell=model_step(cell)

}

The foreach statement provides a more conventional sequential loop. In the above example, the sequential loop over time steps is placed inside the parallel loop over grid cells. This is good PM style where parallel statements typically enclose sequential operations. The foreach invar variation of foreach indicates that the iteration space is identical for all strands in the parallel context - this both enables a more efficient implementation and allows some synchronisation rules to be relaxed.

Most models require interaction between strands. This is achieved using communicating operations. One of the simplest of these is the sync statement, which enables you to change the value of a variable defined outside of the scope of the current parallel statement (something that you cannot do using a conventional assignment statement):

// Parallel search for at least one point meeting a given criterion

var found=false

for point in search_space {

if criterion_met(point) : sync found=true

}

The sync statement synchronises all the strands in the current parallel context (i.e., those created by the enclosing for statement) and then executes its associated assignment simultaneously on all active strands (those strands where the point met the criterion). Note that this synchronisation imposes a logical order on operations and does not necessarily require a hard barrier.

Distributed Execution

PM is specifically designed to work with distributed systems. A PM program runs on a system consisting of a set of nodes (whose mapping to harware units is implementation dependent). Moving data between nodes is assumed to incur a substantial overhead. Language mechanisms exist to partition both memory and execution among the available nodes.

// Distribute two different-sized arrays among available nodes using fixed-size blocks

var A=array(1.0,[1..M],BLOCK)

var B=array(1.0,[1..N],BLOCK)

// The following will result in inter-node communication

B[N-M..N]=A

// The same copy using a for statement

// - for statement will use the same distribution as array A to assign strands to nodes

for cell in A {

// The following will result in inter-node communication

// - The current index position can be referred to using here

// - Assigning to cell will update the corresponding element in A

cell=B[here+origin] where origin=[N-M]

}

Stencils

As a numerical modelling language PM also specific support for stencils. For example, if a model is requires a cell to depend of neighbouring cells like this:

then it may be implemented using the nhd statement. This is implemented as a statement, rather than an operator or function to emphasise the possible re-organisation of strands within its body of statements (for example, scheduling the interior before the edges).

for point in [1..M,1..N] {

var cell = 0.0

over [1,]: cell=EAST_BOUNDARY

over [M,]: cell=WEST_BOUNDARY

over [,1]: cell=NORTH_BOUNDARY

over [,N]: cell=EAST_BOUNDARY

foreach invar step in start_time..end_time {

nhd ortho[-1..1,-1..1 ] dcell of cell bounds EXCLUDED {

cell=model_step(cell,north,south,east,west)

where

north=dcell[ 0,-1],

south=dcell[ 0, 1],

east =dcell[-1, 0],

west =dcell[ 1, 0]

}

Type system

PM makes use of modern developments in type inference. All PM procedures as essentially generic templates that may be applied to a range of concrete data types. This gives the flexibility of programming in interpreted languages such as Python but without the overhead – call resolution and procedure specialisation occurs at compile time in a manner that has some similarities to both Julia and Chapel but which supports polymorphism differently from either. A very simple PM program to perform a basic calculation could look like this:

proc length(x,y)=sqrt(x**2+y**2)

print("The length of (3,2) is:"++length(3,2))

print("The length of (3.5,2.3) is:"++length(3.5,2.3))

Here the procedure length is invoked with two different types of argument. The compiler will create separate procedure bodies for each combination of argument types – there is no run-time dynamic dispatch unless specifically requested by the programmer.

The type of any given expression in PM may be determined by inspection and variables do not change type during their lifetime. Determining variable types by initialisation thus works even for complex cases:

Procedure definitions may specify type constraints for their parameters. These are then used for compile-time procedure selection. Selection occurs across all parameters, giving rise to a system of multi-methods.

The following example demonstrates object oriented coding in PM. This example is statically resolved at compile time except for one location where dynamic dispatch is requested using the any statement. The cost of dynamic dispatch is the same as a single multi-way branch.

// Define shape as an extensible generic type

type shape is ...

// Define a private type for common elements

type _shape is rec{npoints:int}

// Define some types that are members of abstract type shape

type triangle in shape is rec {

use shape,

point1:tuple2d,

point2:tuple2d,

point3:tuple2d

}

type square in shape is rec {

use shape,

point1:tuple2d,

point2:tuple2d,

point3:tuple2d,

point4:tuple2d

}

// Now define some procedures to create new shape values

proc _shape(npoints)= new _shape{npoints=npoints}

proc make_triangle(point1,point2,point3)=

new triangle{

shape=_shape(3),

point1=point1,

point2=point2,

point3=point3

}

proc make_square(point1,point2,point3,point4)=

new square{

shape=_shape(4),

point1=point1,

point2=point2,

point3=point3,

point4=point4

}

// ... and some procedures to do something with these shapes (details omitted)

proc draw(shape:triangle) { … }

proc draw(shape:square) { … }

// ... and a procedure to fill in an array of polymorphic shape elements

proc fill_shapes_array(&shapes_array:array(*shape)) { … }

// Now some main program code

// Create and fill an array of polymorphic shape values

var shapes=array(make_triangle([0,0],[0,0],[0,0]) as <*shape>,[1..N])

fill_shapes_array(&shapes)

// Draw each shape and count the total points

var total_points=0

foreach shape in shapes {

// The any statement performs dynamic dispatch

any shape {

total_points=total_points+shape.npoints

draw(shape)

}

More advanced synchronisation

One of the design features of PM is the use of communicating operations to both initiate and synchronise the parallel execution of code. A PM program defines an expected order in which operation side-effects are expected to occur and the implementation then translates this into lower-level mechanisms: message passing, lock, barriers, etc. In most cases, the default ordering of operations is sufficient and little extra work is required on the part of the programmer. There are, however, a small number of additional synchronisation rules catering for more complex cases.

The sync statement does not allow two stands to simultaneously assign different values to the same variable or to the same variable sub-element. In the above example, this requirement is trivially satisfied. However, in more complex examples, particularly those involving arrays, this rule becomes more important:

var a=array(0,[0..N,0..N])

var b=array(0.0,[0..N,0..N])

for point=#a {

// Set elements in one quadrant of the array [0..N/2,0..N/2]

// This assignment works - each point is simultaneously assigned to

// by four stands, but each strand assigns the same value

sync a[point.1/2,point.2/2]=(point.1/2)**2+(point.2/2)**2

// This assignment will fail - moving to real arithmetic means that

// three different values [four values, but two are the same]

// are simultaneously assigned to each point

sync b[point.1/2,point.2/2]=(point.1/2.0)**2+(point.2/2.0)**2

}

The assign-one-value rule may often be enforced at compile-time and is straightforward to test at runtime.

In most straightforward code, there is no ambiguity regarding the ordering of operations on different strands. However, when complex control statements are present it may be necessary to provide additional information to clarify the respective sequencing of operations between strands that follow different paths through the code. This is achieved by labelling statements. Statements with the same label are synchronised between strands and then executed one after the other, at least in terms of side effects in the order that they appear in the source file, before any strand is permitted to proceed further:

var success=false

for point in search_space {

if point_is_enabling(point) {

if test_enabling_point(point): found: sync success=true

} else {

if test_blocking_point(point): found: sync success=false

}

In the above code, the final value of success will be true if any enabling point meets the enabling criterion but no blocking point meets the blocking criterion. To achieve this, parallel strands are synchronised as follows:

All strands will proceed until they encounter a communicating statement (in this case one of statements labelled found.)
If the communicating statements encountered by different strands reside on different control paths (if statement branches), then those statements must be labelled and have the same label; otherwise a compile-time error is raised.
The assignment success=true will then execute simultaneously on all strands for which test_enabling_point has been evaluated and returned true.
The assignment success=false will then execute simultaneously on all strands for which test_blocking_point has been evaluated and returned true,
All strands are then allowed to proceed forward from their respective labelled statement.

In practice, labelling is rarely necessary - the ~5000 line standard PM library module does not use it at all.