During spring semester 2018, I will serve as teaching assistant for CS 222 — Computer Architecture. This course is taught by Professor Alan Ling, and it meets Tuesday and Thursday from 2:50 PM to 4:05 PM in Lafayette L200.

You can find Professor Ling’s page for this course here, and announcements here.

As TA, I’ll be focusing on software simulations.

My office hours for this course are by appointment only. During this semester, Tuesday afternoons, Thursdays, and Fridays are best. I can be reached at clayton (dot) cafiero (at) (you know the rest).

Topics - Fundamentals of computer design, performance and cost, instruction set architecture, processor technology, pipelining, scoreboarding and Tomasulo algorithm, memory system (caches, virtual memory), input/output systems and bus communications, and parallel processors.

• As regards index, tag, and offset, it’s best not to think so much in terms of calculating them but rather in terms of reading them.

Look at page B-14. Once you know the number of index bits and the number of offset bits, then you can calculate the number of tag bits. These values will not vary for any particular configuration. Once you have these, you know how do divide any address into the appropriate chunks.

There are a number of ways to implement this. Consider using bit masking. Knowing the number of bits, you can create a bit mask for each portion of the address, and extract index, tag and offset using these masks and bitwise operations – & and (if needed) >>. For instance (not real address, just for illustration):

address  1010001001001001010010


If you’re working with int (or long int, whatever) bitwise operations will act on the binary representation of the value. For example a = 5, b = 9, a & b = 1. Do you see why? Because 5 is 0101, 9 is 1001, so bitwise a & b is 0001. Make sense?

• I’ve pushed sample code for the pipeline simulation project to UVM GitLab.

This code is unmodified from the way it was submitted in spring of 2016, and the assignment was slightly different that what you were assigned this semester, so please don’t expect perfect correspondence between the specifications that were provided to you and this particular sample. Nevertheless, it should serve as an example of one possible implementation.

• Overview - There was a wider range of outcomes for this project, which was to be expected. The breadth of this range was rather disappointing. Some teams did very well – their code performed as expected, accepted the specified input parameters, generated valid and accurate output, and was (relatively) easy to read. Other teams seemed to have had some difficulty.

Please take advantage of the offer to have your code reviewed while in progress. Whatever WIP you hand in will not affect your grade. The grade will depend on the final submission only. It can be of great value – first, because you’re forced to prepare something well in advance of the final deadline, even if it’s incomplete; second, because you get feedback before you are graded. Don’t be shy.

• Please see your individualized team feedback and read the general feedback on Project 01 - Pipeline Simulation before starting on your cache simulation. Thanks.

Due: Thursday, 3 May by 12:00 noon

Cache simulation assignment

Traces and sample configs

• Some teams have asked for additional sample output, in order to check their result against a standard. I think it would be best to devise your own tests. This should not be too much trouble since you only need to see what happens within the length of the pipeline. If your simulator accurately detects hazards and correctly inserts stalls for twenty lines of a trace, it should work for 20,000. There’s no difference in kind that’s introduced by having a longer trace.

That said, here are a few data points that will help you confirm you are on the right track (or not), reading gcc-1K.trace:

## Five stages, read at 2, result at 5

Pipeline simulation:
1,000 instructions traced
...
1,652 cycles (including stalls)
...


and

## Eight stages, read at 2, result at 7

 ID        Progress
--------  -------------
100       1 · · · · 2 3 4 5 6 7 8
101                 1 2 3 4 5 6 7 8
102                   1 · · · · 2 3 4 5 6 7 8
103                             1 2 3 4 5 6 7 8
104                               1 · · · · 2 3 4 5 6 7 8
105                                         1 2 3 4 5 6 7 8
106                                           1 · · · · 2 3 4 5 6 7 8
107                                                     1 2 3 4 5 6 7 8


Don’t read anything into my choices of parameters here. I just pulled them out of thin air.

Some people have asked why we would have more than five stages in the pipeline. There are many different architectures, with different pipeline lengths. Some CPUs have only two, some have (yes) over a thousand.

• AMD Athlon 64 X2 has 12 stages
• Intel Pentium D has 31 stages
• Intel Pentium4 Prescott has 30
• ARM 11 has 8 stages
• ARM Cortex-A15 MPCore has 15

The point of this exercise is not to model the MIPS pipeline specifically, but rather to build a simulator we can use to observe the effect of pipeline length on handling stalls (among other things).

We will perform similar abstractions / generalizations when modeling multi-stage cache.

Make sense?

Hope this helps!

• Hey. Don’t be writing your own parsers. The C++ standard library is your friend:

• std::getline
• std::stringstream
• std::istream_iterator
• std::vector

Don’t reinvent the wheel.

P.S., It won’t be factored into grading but you may be interested to know that I’m decidedly in the camp which frowns upon

using namespace foo;