#### Performance Assignment

This performance assignment is based on the one by Bryant and O’Hallaron for Computer Systems: A Programmer’s Perspective, Third Edition

Due: Wednesday, October 3, 11:59pm

This assignment deals with optimizing memory intensive code, and the area of image processing offers many examples of functions that can benefit from optimization. In this lab, we will consider two image processing operations: pinwheel, which rotates a diamond-shaped portion of an image counter-clockwise by 90 degrees while also converting to grayscale, and glow, which blurs an image to simulate glowing colors in a fog.

These instructions are long, but the lab itself may not be too time-consuming to get the threshold results required for full credit. The potential upside for clever optimizations is anyone’s guess.

##### Image Representation

For this lab, we will consider an image to be represented as a size followed by a two-dimensional matrix M, where Mi,j denotes the value of (i,j)th pixel of M. Pixel values are triples of red, green, and blue (RGB) values. We will only consider square images. Let N denote the number of rows (or columns) of an image. Rows and columns are numbered, in C-style, from 0 to N-1.

##### Pinwheel Operation

Given this representation, the pinwheel operation applies to a diamond-shaped center portion of the image. The diamond center is defined as all the pixels that are completely bounded by lines drawn from the center of one of the image to the center of an adjacent edge. For example, in an image of size 8, the diamond pixels are the ones filled in the following image:

To determine whether a pixel indexed as (i,j) is in the diamond for an image of size N, add 1/2 to each component to conceptually move to the center of the pixel, subtract N/2 from each component to effectively find the distance to the image’s midpoint, then check whether the absolute values of the adjusted components sum to less than N/2:

abs(i+1/2-N/2) + abs(j+1/2-N/2) < N/2

Within the diamond portion, the following transformation applies (using the original, unadjusted index components):

• Transpose: For each (i,j) pair, Mi,j and Mj,i are interchanged.

• Exchange rows: Row i is exchanged with row N-1-i. Exchanging rows after transposing has the effect of rotating an image by 90 degrees.

• Grayscale: Set the red, green, and blue components all to the average of the three components.

For example, applying pinwheel to

produces

##### Glow Operation

The glow operation is implemented by replacing every pixel value with a combination of nine pixels: the pixels that form a 3x3 block with the target pixel in the center. Pixels in the source image are weighted as follows:

 0.16 0 0.16 0 0.3 0 0.16 0 0.16

That is, the new value of Mi,j is computed as

 Mi-1,j-1×0.16 + Mi+1,j-1×0.16 + Mi,j×0.30 + Mi-1,j+1×0.16 + Mi+1,j+1×0.16

For the purposes of computing Mi,j’s value, neighbor pixels beyond the edge of the image are treated as black.

For example, applying glow to

produces

##### Setup

Start by copying perflab-handout.zip to a protected directory in which you plan to do your work. Then, run the command:

\$ unzip perflab-handout.zip

This will cause a number of files to be unpacked into the directory. The only file you will be modifying and handing in is "kernels.c". The "driver.c" program is a driver program that allows you to evaluate the performance of your solutions. Use the command make driver to generate the driver code and run it with the command ./driver.

##### Data Structures

The core data structure deals with image representation. A pixel is a struct as shown below:

 typedef union { struct { unsigned short red;   /* R value */ unsigned short green; /* G value */ unsigned short blue;  /* B value */ }; int dim; } pixel;

An image I is represented as a one-dimensional array of pixels. The first “pixel” in an image uses dim to report the dimension of the image (i.e., the height and width, which are the same). Each subsequent pixel uses the red, green, and blue fields for one pixel’s 16-bit RGB values. The (i,j)th pixel of an image I is I[RIDX(i,j,I->dim)], where RIDX is a macro defined as follows:

 #define RIDX(i,j,n) (1+(i)*(n)+(j))

See the file "defs.h" for this code.

The pinwheel and glow functions receive two pixel* pointers representing source and destination images. The source image must not be changed, and the destination image must be filled with the result of transforming the source. The source and destination images have the same dimensions, and the destination dimension is already filled in when pinwheel or rotate is called.

##### Pinwheel Implementation

The following C function computes the result of pinwheeling the source image src and stores the result in destination image dst. It implements the diamond-inclusion test plus all three transformations (transpose, exchange, and grayscale) in a single pass.

 void naive_pinwheel(pixel *src, pixel *dest) { int i, j; for (i = 0; i < src->dim; i++) for (j = 0; j < src->dim; j++) { /* Check whether we're in the diamond region */ if ((fabs(i + 0.5 - src->dim/2) + fabs(j + 0.5 - src->dim/2)) < src->dim/2) { /* In diamond region, so rotate and grayscale */ int s_idx = RIDX(i, j, src->dim); int d_idx = RIDX(src->dim - j - 1, i, src->dim); dest[d_idx].red = ((int)src[s_idx].red + src[s_idx].green + src[s_idx].blue) / 3; dest[d_idx].green = ((int)src[s_idx].red + src[s_idx].green + src[s_idx].blue) / 3; dest[d_idx].blue = ((int)src[s_idx].red + src[s_idx].green + src[s_idx].blue) / 3; } else { /* Not in diamond region, so keep the same */ int s_idx = RIDX(i, j, src->dim); int d_idx = RIDX(i, j, src->dim); dest[d_idx] = src[s_idx]; } } }

The above code scans the rows of the source image matrix, copying to the columns of the destination image matrix. Your task is to rewrite this code to make it run as fast as possible using techniques like code motion and loop reorganizations.

See the file "kernels.c" for this code.

##### Glow

The glow function takes as input a source image src and returns the blurred result in the destination image dst. Here is part of an implementation:

 void naive_glow(pixel *src, pixel *dst) { int i, j; for (i = 0; i < src->dim; i++) for (j = 0; j < src->dim; j++) dst[RIDX(i, j, src->dim)] = weighted_combo(src->dim, i, j, src); }

The function weighted_combo performs the weighted combination of the pixels around the (i,j)th pixel. Your task is to optimize glow (and weighted_combo) to run as fast as possible. (Note: The function weighted_combo is a local function and you can change it or get rid of it altogether to implement glow in some other way.)

This code and an implementation of weighted_combo are in the file "kernels.c".

##### Performance measures

Our main performance measure is CPE or Cycles per Element. If a function takes C cycles to run for an image of size N×N, the CPE value is C/N2. When you build and driver its output shows CPE results for 5 different values of N. The baseline measurements were made on a CADE lab1-n machine.

The ratios (speedups) of the optimized implementation over the naive one will constitute a score of your implementation. To summarize the overall effect over different values of N, we will compute the geometric mean of the results for these 5 values. See Evaluation for more information on grading.

##### Assumptions

To make life easier, you can assume that N is a multiple of 32. Your code must run correctly for all such values of N but we will measure its performance only for the 5 values reported by driver.

##### Infrastructure

We have provided support code to help you test the correctness of your implementations and measure their performance. This section describes how to use this infrastructure. The exact details of each part of the assignment are described in the following section.

Note: The only source file you will be modifying is "kernels.c".

##### Versioning

You will be writing many versions of the pinwheel and glow routines. To help you compare the performance of all the different versions you’ve written, we provide a way of “registering” functions.

For example, the file "kernels.c" that we have provided you contains the following function:

 void register_pinwheel_functions() { add_pinwheel_function(&pinwheel, pinwheel_descr); }

This function contains one or more calls to add_pinwheel_function. In the above example, add_pinwheel_function registers the function pinwheel along with a string pinwheel_descr which is an ASCII description of what the function does. See the file "kernels.c" to see how to create the string descriptions. This string can be at most 256 characters long.

A similar function for your glow kernels is provided in the file driver.c.

##### Driver

The source code you will write will be linked with object code that we supply into a driver binary. To create this binary, you will need to execute the command

\$ make driver

You will need to re-make driver each time you change the code in "kernels.c".

To test your implementations, you can then run the command:

\$ ./driver

The driver can be run in four different modes:

• Default mode, in which all versions of your implementation are run.

• Autograder mode, in which only the pinwheel and glow functions are run. This is the mode we will run in when we use the driver to grade your handin.

• File mode, in which only versions that are mentioned in an input file are run.

• Dump mode, in which a one-line description of each version is dumped to a text file. You can then edit this text file to keep only those versions that you’d like to test using the file mode. You can specify whether to quit after dumping the file or if your implementations are to be run.

If run without any arguments, driver will run all of your versions (default mode). Other modes and options can be specified by command-line arguments to driver, as listed below:

• -g Run only pinwheel and glow functions (autograder mode).

• -f funcfile Execute only those versions specified in funcfile (file mode).

• -d dumpfile- Dump the names of all versions to a dump file called dumpfile, one line to a version (dump mode).

• -q Quit after dumping version names to a dump file. To be used in tandem with -d. For example, to quit immediately after printing the dump file, type ./driver -qd dumpfile.

• -i Write the main test images to files that end in ".image". Each file contains N twice to indicate the height and width of the image, and it contains the pixel values as a sequence of red, green, and blue numbers.

Combined with -m gradient or -m squares and viewed with a suitable decoder, these dumps can be helpful in debugging problems with your functions.

One way to view a file’s image is to run

\$ racket show-image.rkt filename

on a CADE machine, or you can use

\$ racket show-image.rkt --png filename

to generate a ".png" version.

• -I Write all the benchmarks images to files that end in ".image".

• -m mode Selects the starting image, where mode can be random, gradient, or squares. The default (and grading mode) is random.

• -h Print the command line usage.

##### Optimizing Pinwheel (50 points)

In this part, you will optimize pinwheel to achieve as low a CPE as possible. You should compile driver and then run it with the appropriate arguments to test your implementations.

For example, running driver with the supplied naive version (for pinwheel) generates the output shown below:

 \$ ./driver Pinwheel: Version = naive_pinwheel: baseline implementation: Dim 64 128 256 512 1024 Mean Your CPEs 13.2 13.3 13.1 13.2 15.2 Baseline CPEs 13.2 13.3 13.1 13.2 15.2 Speedup 1.0 1.0 1.0 1.0 1.0 1.0

##### Optimizing Glow (50 points)

In this part, you will optimize glow to achieve as low a CPE as possible.

For example, running driver with the supplied naive version (for glow) generates the output shown below:

 \$ ./driver Glow: Version = naive_glow: baseline implementation: Dim 32 64 128 256 512 Mean Your CPEs 209.9 214.3 216.9 217.9 221.0 Baseline CPEs 212.0 217.0 219.0 220.0 221.0 Speedup 1.0 1.0 1.0 1.0 1.0 1.0

##### Coding Rules

You may write any code you want, as long as it satisfies the following:

• It must be in ANSI C with GNU extensions. You may not use any embedded assembly language statements or special gcc directives.

• It must not interfere with the time measurement mechanism. You may also be penalized if your code prints any extraneous information.

You can only modify code in "kernels.c". You are allowed to define macros, additional global variables, and other procedures in these files.

##### Evaluation

Your solutions for pinwheel and glow will each count for 50% of your grade. The score for each will be based on the following:

• Correctness: You will get NO CREDIT for buggy code that causes the driver to complain! This includes code that correctly operates on the test sizes, but incorrectly on image matrices of other sizes. As mentioned earlier, you may assume that the image dimension is a multiple of 32.

• CPE: You will get full credit for your implementations of pinwheel and glow if they are correct and achieve mean CPEs improvements at or above thresholds 2.5 and 4.0, respectively. In more detail:

• pinwheel
• between 1.0–2.5 mean speedup: linear mapping to 0–50 points

• above 2.5 mean speedup: 50 points

• glow
• between 1.0–4.0 mean speedup: linear mapping to 0–50 points

• above 4.0 mean speedup: 50 points

##### Hand In Instructions

When you have completed the lab, you will hand in one file, "kernels.c", that contains your solution. Use Canvas to hand in your work.

Make sure that the pinwheel and glow functions correspond to your fastest implementations, as these are the only functions that will be tested when we use the driver to grade your assignment.

Remember also to remove any extraneous print statements.