0%

Common Performance Metrics

  • Latency: response time, execution time, elapsed time

  • Throughout: Amount work per unit time
    Ideally, for good Performance we want is Latency minimized ad Throughout maximized

CPU time

CPU time is CPU spends on executing a program.
cpu time is comprised of three components
$$
\text{CPU time} = \frac{\text{number of instructions}}{\text{program}} \times \frac{\text{cycles}}{\text{instructions}} \times \frac{\text{seconds}}{\text{cycle}}
$$

  • instruction count = instructions/ program
  • CPI = average Cycles/ instruction
  • clock period(clock frequent = 1/ clock period)

Amdahl’s law

Suppose that enhancement (E) accelerates a fraction (F) of a task by a factor (S) and the remainder of the task is unaffected

$$
\text{over all speedup} = \frac{1}{1-F + \frac{F}{S}}
$$

Objected oriented programming(OOP)

In procedural programming languages(like C), programming like to be action-oriented, whereas in JAVA - programming is Object-oriented.

In OOP, programmer like to defined their own self-defined types, called classes.

Each class will contain the data and the set of the method that manipulate the data.

A user defined type(eg.class) is called a object.

Inheritance

Inheritance relations form tree-like hierarchical structure.
(eg. postgraduate students is a subclass of students)

In a “Is a” relationship, an object of a subclass may also be treated as an object
of superclass.

In a “Has a” relationship, a class object has an object of another class to store its
state or do its work. It “has-a” reference to that other object.

Designing a Class

  1. Always try to keep data private
  2. Creating an object may require different actions such as initialization
  3. Break up class with many responsibilities

Mips interpretion

Here is a traditional loop in C

1
2
while(save[i] == K)
i+= 1;

Assume that i and k correspond to registers $s3 and $s5 and the base of the array save is in $s6. What is the MIPS assembly code corresponding to this C segment?

1
2
3
4
5
6
7
Loop: sll  $t1 $s3 2 
add $t1 $t1 $s6
lw $t0,0($t1)
bne $t0,$s5, Exit
addi $s3,$s3, 1
j loop
Exit:

Signed Versus Unsigned Comparison

Suppose register $s0 has the binary number
$$ 1111\ 1111\ 1111\ 1111\ 1111\ 1111\ 1111\ 1111_{two}$$
and that register $s1 has the binary number
$$ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0001_{two}$$
where are the values pf registers $t0 and $t1 after these two instructions?

1
2
slt      $t0, $s0, $s1
sltu $t1, $s0, $s1

The key spot of the question is $s0 represents $-1_{ten}$ if it is a signed integer and $2^{32} - 1_{ten}$ if it is an unsigned integer. And the value in $s1 represents $1_{ten}$ in either case.

Jump opreation in MIPS

Instruction Syntax Address Source Saves Return Address Operation Common Use
J j target 26-bit immediate No PC = (PC+4)[31:28] || target || 00 Unconditional jumps, loops
JAL jal target 26-bit immediate Yes (in $ra) $ra = PC + 4
PC = (PC+4)[31:28] || target || 00
Function calls
JR jr $rs Register content No PC = $rs Function returns, indirect jumps

The jump register instruction jumps to the address stored in register $ra which is just what we want. Thus, the calling program, or caller, puts the parameter values in $a0–$a3 and uses jal X to jump to procedure X (sometimes named the callee). The callee then performs the calculations, places the results in $v0 and $v1, and returns control to the caller using jr $ra.

Using more registers

We may have such question, What if we have a procedure which require more than 4 arguments and 2 return value registers. This situation is an example in which we need to spill registers to memory.

Here we need to use a data structure called stack. A stack needs a pointer to the most recently allocated address in the stack to show where the next procedure should place the registers to be spilled or where old register values are found.By historical precedent, stacks “grow” from higher addresses to lower addresses.

Compiling a C Procedure That Doesn’t Call Another Procedure:

1
2
3
4
5
6
int leaf_example(int g, int h, int i, int j) 
{
int f;
f = (g + h) – (i + j);
return f;
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
addi $sp, $sp, –12 
sw $t1,8($sp)
sw $t0,4($sp)
sw $s0,0($sp)

add $t0,$a0,$a1
add $t1,$a2,$a3
sub $s0,$t0,$t1

add $v0,$s0,$zero

lw $s0, 0($sp)
lw $t0, 4($sp)
lw $t1, 8($sp)
addi $sp,$sp,12
jr $ra

Decoding machine code

What is the assembly language statement corresponding to this machine instruction?
00af8020hex

The first step in converting hexadecimal to binary is to find the op fields:

1
2
0    0    a    f    8    0    2    0  
0000 0000 1010 1111 0100 0000 0010 0000

op field is 000000 in op(31:26) rs is 00101 rt is 01111 rd is 10000 shamt is 00000 funct is 100000

By look up table we can know it is
add $s0, $a1, $t7

I. What is the range of addresses for conditional branches in MIPS (K = 1024)?

  1. Addresses between 0 and 64K # 1
  2. Addresses between 0 and 256K # 1
  3. Addresses up to about 32K before the branch to about 32K after
  4. Addresses up to about 128K before the branch to about 128K after

Conditional branches use I-type format
16-bit immediate field for the branch offset
The offset is signed
The offset represents the number of words to branch not bytes
Hence it will be $2^{16}$ words, $2^{15}$ words, $2^{17}$ bytes ahead and backward.
which is 128KB ahead and backward.

typical steps of processor design

For datapath

  1. Analyse instruction set to determine the datapath requirements

  2. Select a set of hardware components for the datapath and establish clocking methodology

  3. Assemble datapath to meet the requirements

For control

  1. Analyse implementation of each instruction to determine control points/signals on the datapath

  2. Assemble the control logic

step 1

For each instruction, its requirement can be identified as a set of transfer operation.

Register lever language(RTL) is used to describe the instruction execution.eg.

1
$3 <-- $2 + $1 is for ADDU $3 $2 $1

Datapath mush support each transfer operation.

All instruction start by fetching instruction.

THen followed by different operation
All instruction should be followed with a

1
PC <- PC + 4

eg

1
2
BEQ 
If (R[rs] == R[rt]) then PC <- PC +4 + sign_ext(Imm16) || 00 else PC <- PC + 4

step 2

For combinational logical elements, we need adder, MUX, ALU obviously and

For storage we need some registers (which is 32-bit input and output, and Write enable input) and memory(which is have 32 bit data in and 32 bit date out)

For simple and robust, we use Edge triggered clocking, all storage elements are usually clocked by the same clock edge.

We have 3 time issues need to be considered. Clock skew, setup time, hold time.

2 requirement
$\text{cycle time} >= \text{clk to Q} + \text{longest delay path} + \text{setup} + |\text{clk skew}|$
$\text{clk to Q} + \text{shortest delay path} - \text{clock skew} > |\text{hold time}|$

step 3

Put the selected components together

step 4

instruction encoding defines how instruction and their argument are represented as binary values - machine instructions.