CS 4960-01 Homework 10

Due: Monday, November 17th, 2008 10:45am

Using Titanium, implement the red-blue problem of HW7.

Be sure to read the README.txt file in ti_ex.zip.

Start with HW10.java which is a Java sequential solution to the red-blue problem. You'll also need BSDRand.ti (which you could rename to BSDRand.java to compile HW10.java with Java, but see README.txt for a warning on file extensions).

Part 1

Port the Java program to Titanium.

Your Titanium solution should use char [2d] arrays instead of Java-style char[][] arrays. So, first convert the Java code to Titanium code by changing the representation and use of m.

Of course, your solution should also support parallelism. In particular, use the same distribution of work as in the HW9 solution: each process handles a set of blocks for checking the end condition, a set of whole rows for moving reds, and a set of whole columns for moving blues. In SMP mode, you should be able to get a 1.4 to 1.8 speed-up.

For this part, communicate a shared array from a root process to all other processes, and have all processes work on the same array.

Handin your solution to this part as RB.ti.

Part 2

The solution from part 1 will run much more slowly in MPI mode than in sequential mode. Using two processes, in fact, performance is disasterously bad; the fine-grained sharing of the global array produces far too much communication.

Adjust your solution to explicitly copy individual blocks, rows, and columns (as needed) from the shared global array to a local array, and copy each changed row or column back to the global array after it has been processed. By using this kind of explicit communication, you should be able to get the run time with MPI and multiple processes to be about the same as MPI with a single process – which is still slower than the sequential version.

In this case, the work performed on the array is so simple compared to the communication costs (with the prescribed organization of work) that cluster processing cannot help.

Handin your solution to this part as RBCopy.ti.

Part 3 (optional, DOUBLE extra credit)

A different organization of data and work can allow MPI mode to provide speed-up with multiple processes, at least compared to MPI with a single process. (That MPI with a single process is so much slower than the sequential version seems to be a limitation of Titanium, which is mostly designed to use a different message-passing system.)

Instead of having processes alternately take whole rows and columns, find a different way to distribute or communicate the data that enables a speed-up for multiple MPI processes compared to a single MPI process. (It's ok to still be slower than the sequential version, since we chalk that up to a compiler limitation.)

One possible strategy is to allocate to each process a set of rows that covers whole T by T sub-arrays, so that no array data needs to be communicated for end-condition checking or red movement, and only boundary rows need to be commuincated for blue movement.

Last update: Monday, December 1st, 2008

mflatt@cs.utah.edu