CS 4960-01 Homework 1

Due: Wednesday, September 3rd, 2008 10:45am

Submit a single program consistent with part 2 (at least) on a CADE machine using

handin cs4960-01 hw1 <file>

Part 1

Start with one of the following:

hw1.c—you'll also need popp_thread.h from count3s.zip. You can use any programming environment you want. To compile at a command line, you'd normally use
gcc -O2 -Wall hw1.c -lpthread
which produces an executable names a.out.
HW1.java—you'll also need PoPPThread.java from count3s.zip. You can use any programming environment you want. To compile at a command line, you'd normally use
javac PoPPThread.java HW1.java
and then run using
java HW1

The main program currently calls sequential_sums or sequentialSums. Your job is to change it to call parallel_sums or parallelSums, to implement the corresponding function, and to obtain a performance improvement (while getting the same result) by using two threads instead of just one.

The sequential program is a kind of prefix sum, except that each value in the original array is converted before it is added to the running count. You must not change the convert function or make any assumptions from outside the function about the computation that it performs, because convert is meant to simulate some useful, expensive computation.

Before putting an accumulted result back into the array, the program deconverts it. Again, you should not (yet) change deconvert or make any assumptions about what it does. For now, though, it's meant to simulate an inexpensible calculation.

Using two threads, you should be able to make the program run about twice as fast. The key is to perform the expensive work in parallel, which will require a new array to store converted values. Can you get a roughly 2x speedup using an extra array that's only half as large as the original?

Part 2

Change deconvert so that it calls convert:

static int deconvert(int v)
{
return convert(v);
}

The correct output of the program is now 2 0 2. If you change the first slot of the initial array to contain 3 instead of 2, then the correct output is 6 2 2.

With this change, deconvert is an expensive operation whose cost is about the same as convert.

Adjust your program as needed (if at all) to run about twice as fast as the sequential when using the new deconvert.

Part 3 (optional)

How small can you make the extra array and still obtain a good speedup? You might, for example, break up the original array into 10 pieces, then solve in sequence each piece, using parallelism within a piece. When you use enough pieces, though, the overhead of thread creation swamps the benefits of parallelism.

Last update: Monday, December 1st, 2008

mflatt@cs.utah.edu