I asked Gemini to draw two giraffes, one with a 32-bit neck, and one with a 64-bit neck. This is what I got - truly we live in wonderful times.
I’ve been writing a library of integer math routines for RISC-V processors which lack instructions for things like multiplication and division. I thought I’d share a neat trick for sign extending 32-bit values to 64-bit values using the minimal instruction set.
I won’t go into the details of two’s complement integer representation here, I assume the reader is familiar. If you aren’t there’s a Wikipedia page I’m sure, or you could ask Claude to explain it to you. Suffice to say in two’s complement, the leftmost bit of a number is the “sign bit” and if the number is negative (sign bit is 1), all the leading bits of the number are 1’s rather than zeroes. So a 4-bit positive 1 is 0001b, and a 4-bit negative 1 is 1111b.
So, if you perform, say, a 32-bit by 32-bit multiply and provide a 32-bit result, but then store it in a 64-bit register, you need to somehow “fill in all the ones” in the top bits of that 64-bit register if the 32-bit value’s leftmost bit is a 1.
In x86_64 assembly language, you have a multiplicity of ways to extend the sign of a number. Of course one of them is a variant of the MOV
instruction, MOVSX
(Move with Sign-Extension) and MOVSXD
(Move with Sign-Extension Doubleword). This is unsurprising as the x86_64 MOV
instruction is known to be Turing complete.
There are a variety of other x86_64 instructions to do variants of this operation - CBW
(Convert Byte to Word), CWDE
(Convert Byte to Doubleword Extended), CDQE
(Convert Doubleword to Quadword Extended), CLTQ
(Convert Long to Quad), CWD
(Convert Word to Double), CDQ
(Convert Double to Quad), CQO
(Convert Quad to Octo).
But in RISC-V, in keeping with the RISC philosophy, there are exactly zero instructions to perform this operation.
So here’s the RISC-V idiom to perform this operation:
slli t0, t0, 32 # sign extend 32->64
srai t0, t0, 32
You perform a logical left shift of the 32-bit quantity in the 64 bit register by 32 bits. This inserts zeroes in the low-order 32-bits. If your register contained:
0x00000000FFFFAAAA
before the shift, afterwards it would contain 0xFFFFAAAA00000000
. A “logical” shift left inserts zeroes on the right. The next instruction is a shift right “arithmetic” by 32-bits. An “arithmetic” shift right inserts zeroes on the left if the high bit is zero, or ones if the high bit is a one. So in this case it would result in 0xFFFFFFFFFFFFAAAA
.
It’s a very handy idiom. I’m enjoying not having so many instructions to learn, but one drawback is I don’t get to deal with instructions that refer to octowords.
A reader points out if you have a 64-bit instruction set, you gain the ADDW and ADDIW instructions, and these are even more efficient ways to sign extend in the particular case of 32 to 64-bit extension. For example you could do:
addw t0, t0, x0
I have noticed some complexity creeping into the instruction set here and there. For example the RV64M instruction set has a MULW
instruction that multiples a 32-bit number by a 32-bit number and provides a 32-bit sign-extended result in a 64-bit register. This could have been accomplished by using the MUL
instruction followed by SLLI
and SRAI
. I’m slightly disappointed by my minimalist overlords.
Another common idiom in RISC-V is a slight variation of this; if you want to ensure the upper 32 bits of a 64 bit register are zero, this is an efficient way to do it:
slli t1, t1, 32
srli t1, t1, 32
Why prefer this to writing andi t1, t1, 0xFFFFFFFF00000000
? Because immediate operations only allow 12-bit quantities in RISC-V, and so the assembler would actually generate a code sequence like this:
addi t0, x0, -1
slli t0, t0, 32
and t1, t1, t0
Which uses one more register and one more instruction to accomplish the same thing. Even this contains a trick to shorten it — the addi instruction can only add an 12-bit immediate number in the range -2048 to 2047. By loading -1, we take advantage of - you guessed it - sign extension to fill the top 20 bits with 1’s because the most significant bit of the -1 is a 1. And to do the “load” we actually used an addition instruction with the x0
register which is “always zero” in the RISC-V architecture - thus getting rid of the need for a separate load instruction! Without this trick, it would actually take 4 instructions.