What is the proper way to define an assembly code in inline
What is the proper way to define an assembly code in in-line assembler x86 for this case:
multiplication without using mul/imul, using hexadecimal numbers for \"shl\" instruction.
Solution
If we take a look at the timing for the multiply instruction, we will notice that the execution time for this instruction is rather long. Only the div and idiv instructions take longer on the 8086. When multiplying by a constant, we can avoid the performance penalty of the mul and imul instructions by using shifts, additions, and subtractions to perform the multiplication.
A shl operation performs the same operation as multiplying the specified operand by two. Shifting to the left two bit positions multiplies the operand by four. Shifting to the left three bit positions multiplies the operand by eight. In general, shifting an operand to the left n bits multiplies it by 2n. Any value can be multiplied by some constant using a series of shifts and adds or shifts and subtractions. For example, to multiply the ax register by ten, you need only multiply it by eight and then add in two times the original value. That is, 10*ax = 8*ax + 2*ax. The code to accomplish this is
shl ax, 1 ;Multiply AX by two
mov bx, ax ;Save 2*AX for later
shl ax, 1 ;Multiply AX by four
shl ax, 1 ;Multiply AX by eight
add ax, bx ;Add in 2*AX to get 10*AX
The ax register (or just about any register, for that matter) can be multiplied by most constant values much faster using shl than by using the mul instruction. This may seem hard to believe since it only takes two instructions to compute this product:
mov bx, 10
mul bx
However, if we look at the timings, the shift and add example above requires fewer clock cycles on most processors in the 80x86 family than the mul instruction. Of course, the code is somewhat larger (by a few bytes), but the performance improvement is usually worth it. Of course, on the later 80x86 processors, the mul instruction is quite a bit faster than the earlier processors, but the shift and add scheme is generally faster on these processors as well.
We can also use subtraction with shifts to perform a multiplication operation. Consider the following multiplication by seven:
mov bx, ax ;Save AX*1
shl ax, 1 ;AX := AX*2
shl ax, 1 ;AX := AX*4
shl ax, 1 ;AX := AX*8
sub ax, bx ;AX := AX*7
This follows directly from the fact that ax*7 = (ax*8)-ax
Thank you.
