Shown below is a VLIW system in which each long instruction
Shown below is a VLIW system in which each long instruction word generated by the compiler is a packet or bundle of three individual machine operations. The hardware system contains two integer units and one floating point unit. Therefore, up to 2 integer operations and 1 floating point operation can be performed in parallel in the same clock cycle. However, due to the mix of available instructions and possible dependencies, the system may not be able to make use of all the execution units for a particular cycle. In that case, the compiler would insert one or more nops into the instruction bundle (i.e., into the long instruction word) to fill any unused slots.
Show the contents of the long instruction words or bundles that the compiler would generate for the following group of instructions:
add $11,$2,$3
add.s $f4,$f5,$f6
sub.s $f14,$f16,$f1
Solution
VLIW(Very Long Instruction Word):
Very long instruction word(vliw) refers to processor architectures designed to exploiy instruction level parallelism
Conventinal processing units (CPU, processors) mostly allow programs to specify instructions to execute in sequence only a VLIW processsor allows programs to explicity specify instructions to excute at the same time, concurrently, in parallel.
It is intented to allow higher performance without complexity inherent in some other designs.
In the above given VLIW instruction bundle architecture has the following:
Operations are classified as implicit or explicit . Queue operations are implicit operations.
Heads of input queues and tails of output queues are mapped into the regular register space, so any reference to these registers implies a read or a write operation, respectively.
All other operations are explicit. Explicit operations are further classified as the more frequently executed or the less frequently executed operations.
The more frequently executed operations include floating-point multiplications, floatingpoint additions, memory accesses and index register updates, simple integer operations, and branching back to the beginning of the loop. All other operations are considered less frequent; examples include other floatingpoint operations such as divide and format conversions, logical integer operations, byte memory accesses and other non-loop branches. All dyadic floating-point operations have three operands, but dyadic integer operations have only two (that is, the destination register must be the same as one of the source registers).
There are two instruction formats in the VLIW instruction:
loop back operation
one floating-point adder operation
one floating-point multiplier operation.
The example consists of the only single 96-bit instruction:
FOR i=0 TO n-1
DO
{
a[2*i]:= c+b[i]*d;
}
A simple VLIW machine allows the compiler to explicitly schedule more than one operation per long instruction; on a single cycle, the processor might simultaneously schedule a load from memory, an integer addition, and a floating-point multiplication. This style of architecture is very closely related to the explicitly parallel instruction computing (EPIC)architecture.
Sample code:
MUL r10, a, b
MUL r12, d, e
ADD r11, r10, c
ADD r13, r11, r12 .
VLIW Processor:
Very large instruction word means that progra recomplied in the instructions to run sequentially without the stall in the pipeline .
NO need to examine the instruction stream to determine which instructions may be executed in parallel.
Each instruction specifies several independent operations that are executed in parallel to the hardware.
Equivalent to one instruction in a superscalar or purely sequential processor
The number of operations in a VLIW instruction = equal to the number of execution units in the processor
Each operation specifies the instruction that will be executed on the corresponding execution unit in the cycle that the VLIW instruction is issued.
VLIW Processor complier:
The complier responsible for ensuring that all operations in executing unit operation can be executed.
Complier can predict exactly how many cycles will elapse between the executions of two operations by counting the number of VLIW instructions between them
In the above instruction bundle the complier generates:
ADD r1, r2, r 3
SUB r16, r14, r 7
ADD r9, r10, r11.
SUB r12, r2, r14
• supported by 64 one-bit predicate registers
• instructions can set 2 at once.
example:
cmp.eq r1, r2, p1, p2
(p1 ) sub 59, r10, r11
(p2) add r5, r6, r7.
Although all these architectural features contribute to the high effective performance of instruction, they complicate the compiler design and implementation.

