Part 9 – Linux ARM assembly syntax

Basically all instructions on AArch64 have three operators. One target register and two source registers.

Such as:

W5 ← w3 + w4 // W5 ← w3 + w4Copy the code

Or:

Add x5, x3, x4 // x5 ← x3 + x4Copy the code

The 32nd universal register can be:

Add w0, w1, WZR // w0 ← w1 + 0Copy the code
// w0 ← w1 + 2Copy the code
Add w0, w1, #-2 // w0 ← w1 + (-2)Copy the code

Note here that if it is immediate, only the second source operand is allowed to be immediate.

Same thing with subtraction.

Note that subtraction has RSB (Reverse Substract) which is the Reverse of the minuend and subtraction compared to sub.

The multiplication

In 32 bits there is mul,smull,umull.

Mul does not update CPSR.

Smull multiplicates by complement.

Umull is an unsigned value.

Grammar:

{s,u}mull RdestLower, RdestHigher, Rsource1, Rsource2

Multiplication example

A 64-bit number becomes two parts, the higher 32 bits and the lower 32 bits.

N = 232 x nhigher + nlower

Example (232 * 2+ 3755744309)*12345678

X and y are equal to z

Z=X*Y=(232 * X1+ X0)* (232 * Y1+ Y0)

So z is equal to 264 times x1 times y1 plus 232 times x0 times y1 plus x1 times y0 plus x0 times y0

.data .align 4 message : .asciz "Multiplication of %lld by %lld is %lld\n" .align 8 number_a_low: .word 3755744309 number_a_high: .word 2 .align 8 number_b_low: .word 12345678 number_b_high: .word 0. Text /* Not an efficient method for 64 bit multiplication, used for instruction multiplication purposes, logically less strict than. R0, r1 and R2, r3 and R0,r1 returns */ /* Save the register that needs to be overwritten */ push {r4,r5, r6, r7, r8, lr} /* Move {r0, R1} to {r4,r5} */ mov r4, r0 /* r4 ← r0 */ mov r5, R1 /* R5 ← R1 */ umull r0, R6, R2, R4 /* {r0,r6} ← R2 * R4 (X0*Y1)*/ umull r7, R8, R3, R4 /* {r7, R8} ← R3 * R4 (x0 × y1)*/ umull R4, R5, R2, R5 /* {r4,r5} ← R2 * R5 (x1 × y0)*/ adds R2, R7, R4 /* R2 ← R7 + R4 and update CPSR */ ADC R1, R2, R6 /* R1 ← R2 + R6 + C, High addition, X1 * Y1 / / * not * Restore registers * / pop {r4, r5, r6 and r7 and r8, lr} bx lr / * Leave mult64 * /. Global main main: Push {r4,r5, r6, r7, r8, lr} /* Save register */ * Load the numbers from memory */ * {r4,r5} ← a */ LDR r4, Addr_number_a_low /* R4 ← & A_LOW */ LDR R4, [R4] /* R4 ← * R4 */ LDR R5, addr_number_A_HIGH /* R5 ← & A_HIGH */ LDR R5, R5 r5 (r5) / * please * * / / * {r6, r7} please b * / LDR r6, r6 please & b_low addr_number_b_low / * * / LDR r6, [6] / * r6 r6 please * * / LDR r7, Addr_number_b_high /* R7 ← &b_high */ LDR r7, [r7] /* R7 ← *r7 */ /* The first argument passes {r0,r1}, The second argument passes {R2, R3}*/ mov r0, r4 /* R0 ← R4 */ mov R1, R5 /* R1 ← R5 */ mov R2, r6 /* R2 ← r6 */ mov R3, R7 /* R3 ← R7 */ bl mult64 /* Call mult64 function */ /* The result is saved in r0, R1 */ /* Now prepare the call to printf */ /* We have to pass &message, {r4,r5}, {r6,r7} and {r0,r1} */ push {r1} /* 4th (higher) parameter */ push {r0} /* 4th (lower) parameter */ push {r7} /* 3rd (higher) parameter */ push {r6} /* 3rd (lower) parameter */ mov R3, R5 /* R3 ← R5.2rd (higher) parameter */ mov R2, 2nd (lower) parameter */ LDR r0, addr_of_message /*r0 ← &message Here r1 is skipped */ bl printf /* call printf function */ add sp, sp, # 16 / * please sp sp + 16 * / / * Pop the two registers we pushed above * / mov r0, # 0 / * r0 please 0 * / Pop {r4, r5, r6 and r7 and r8, Lr}/* Restore register */ bx LR addr_of_message:. Word message addr_number_a_low:. Word number_a_low addr_number_a_high: .word number_a_high addr_number_b_low: .word number_b_low addr_number_b_high: .word number_b_highCopy the code

as -g -o mult64.o mult64.s

gcc -o mult64 mult64.o

Call the function mult64.

$./mult64

Multiplication of 12345678901 by 12345678 is 152415776403139878

It’s a little bit more complicated to print. The 64-bit bits must be passed as a pair of sequential registers, with the bottom half in even registers.

Meanwhile, the first four parameters of the function are passed in order of R0, R1, R2, and R3. After four arguments, the stack is used. This is really just a contract.

 

Example of 64-bit multiplication

The appeal of changing the 32-bit multiplication to 64-bit is highly recommended on machines because the registers become 64-bit.

Example (232 * 2+ 3755744309)*12345678

32-bit logic is as follows:

.data. align 8 message:. Asciz "multi of % LLD,% LLD by % LLD,% LLD is % LLD \n". Align 8 number_A_low: .word 3755744309 number_a_high: .word 2 .align 8 number_b_low: .word 12345678 number_b_high: .word 0. Text /* Not an efficient way to multiply 64 bits.*/. Type factorial,@function mult64: /* Parameters are passed through x0, x1,x2, x3, And through x0 return */ /* move {x0,x1} to {x4,x5} */ mov w4, w0 /* X4 ← x0 */ mov w5, w1 /* x5 ← x1 */ umulh x6, x2, x4 /* Lower 32 bits multiplied, if there is a higher one, */ umull x0, w2, w4 /* The result of multiplying the lower 32 bits */ umull x7, w3, w4 /* {r7,r8} ← R3 * r4 (x1 × y0)*/ umull x4, w2, W5 /* {R4, R5} ← R2 * R5 (x0 × y1)*/ adds X2, X7, X4 /* R2 ← R7 + r4 and update CPSR, */ adc x1, x2, x6 /* x1 ← x2 + w6 + C, add the high bits, x1 *Y1 does not */ mov x1, x1, LSL #32 Adc x0, x0, x1 ret. Global _start _start: /* {x4, R5} ← A */ LDR x4, addr_number_a_LOW /* R4 ← & a_LOW */ LDR w4, [x4] /* R4 ← * R4 */ LDR x5, Addr_number_a_high / * r5 please & a_high * / LDR w5, [x 5] / * r5 r5 * / / * please * {x6, x7} please b * / LDR x6, Addr_number_b_low /* R6 ← & b_LOW */ LDR W6, [X6] /* R6 ← * R6 */ LDR X7, addr_number_b_high /* R7 ← &b_high */ LDR w7, [x7] /* r7 ← *r7 */ /* The first argument passes {r0,r1}, The second argument passes {R2, R3}*/ mov w0, w4 /* R0 ← R4 */ mov w1, W5 /* R1 ← R5 */ mov w2, w6 /* R2 ← r6 */ mov w3, W7 /* R3 ← R7 */ bl mult64 /* Call mult64 function */ /* Result saved in r0, R1 */ /* ready to call the call function */ mov x5,x0; LDR x2, addr_number_a_low LDR x2, [x2] LDR x1, addr_number_a_high LDR x1, [x1] LDR x3, Addr_number_b_low LDR x3, [x3] LDR x4, addr_number_b_high LDR x4, [x4] LDR x0, addr_of_message /* R0 ← &message */ bl printf /* Call printf function */ mov x0, #0 /* r0 ← 0 */ mov x8, 93 SVC 0 addr_of_message .dword message addr_number_a_low: .dword number_a_low addr_number_a_high: .dword number_a_high addr_number_b_low: .dword number_b_low addr_number_b_high: .dword number_b_highCopy the code

as -g -o mult64.o mult64.s

ld -o mult64 mult64.o -lc -I /lib64/ld-linux-aarch64.so.1

Parameter 1~ parameter 8 in ARM64 are respectively stored in registers X0~X7, the remaining parameters are pushed from right to left once, the called person realizes stack balance, and the return value is stored in X0.

$./mult64

Then run it using the x sign to run. If it is to run properly, run it using a multi: 2,12345678901 by 12345678,0 is 152415776403139878

64-bit simplified logic

The 64-bit simplification logic is as follows:

.data .align 8 message : .asciz "Multiplication of %lld by %lld is %lld\n" .align 8 number_a: .dword 12345678901 .align 8 number_b: .dword 12345678. text /* not an efficient way to multiply 64 bits.*/. Type factorial,@function mult64: mul x0, x0, x1 ret .global _start _start: / * {x4} please a * / LDR x4, addr_number_a / * r4 please & a * / LDR x4, r4 r4 please [x4] / * * * / / * {x6} please b * / LDR x6, Addr_number_b /* R6 ← &b */ LDR x6, [x6] /* R6 ← *r6 */ /* Parameter x0,x1*/ mov x0, x4 /* x0 ← x4 */ mov x1, X6 /* X1 ← x5 */ bl mult64 /* Call mult64 function */ /* Result saved in r0, R1 */ /* ready to call call function */ mov x3,x0; Mov x1,x4 mov x2,x6 LDR x0, addr_of_message */ bl printf /* Call printf function */ mov x0, #0 /* r0 ← 0 */ mov x8, 93 SVC 0 addr_of_message .dword message addr_number_a: .dword number_a addr_number_b: .dword number_bCopy the code

as -g -o mult64s.o mult64s.s

ld -o mult64s mult64s.o -lc -I /lib64/ld-linux-aarch64.so.1

The results are as follows:

$./mult64s

Multiplication of 12345678901 by 12345678 is 152415776403139878

 

division

Most computers perform integer division, with the rest of the number part having the same sign as the numerator.

Unsigned integer division is an integer division consisting of two unsigned integers N and D. The quotient Q and remainder R are always positive.

There is no integer division instruction in ARMv6 (floating-point division instruction), but there is division instruction in ARMV8.

Look at the division instruction:

Integer division

SDIV Wd, Wn, Wm

Signed Divide: Wd = Wn ÷ Wm, treating source operands as signed.

SDIV Xd, Xn, Xm

Signed Divide (extended): Xd = Xn ÷ Xm, treating source operands as signed.

UDIV Wd, Wn, Wm

Unsigned Divide: Wd = Wn ÷ Wm, treating source operands as unsigned.

UDIV Xd, Xn, Xm

Unsigned Divide (extended): Xd = Xn ÷ Xm, treating source operands as unsigned.

64 – bit integer division example

.data .balign 4 message1: .asciz "Hey, type a number: " .balign 4 message2: .asciz "I read the number %d/%d\n" .balign 4 message3: .asciz "Result, Q: %d,R: %d\n" /* Format pattern for scanf */ .balign 4 scan_pattern : .asciz "%d %d" /* Where scanf will store the number read */ .balign 4 number_N: .dword 0 number_D: .dword 0 .balign 4 return: .word 0 .arch armv8-a .global _start .text _start: LDR x0, address_of_message1 /* R0 ← &message1 */ bl printf /* call to printf */ LDR x0, Address_of_scan_pattern /* R0 ← &scan_pattern */ LDR x1, address_of_number_N /* R1 ← &number_read */ LDR x2, Address_of_number_D /* R1 ← &number_read */ bl scanf /* Call to scanf */ LDR x0, Address_of_message2 /* R0 ← &message2 */ LDR x1, address_of_number_N /* R1 ← &number_read */ LDR x1, [x1] /* R1 ← * R1 */ LDR x2, address_of_number_D /* R1 ← &number_read */ LDR x2, [x2] /* R1 ← * R1 */ BL printf /* Call to printf */ LDR x0, address_of_message3 /* R0 ← &number_read */ LDR x1, Address_of_number_N /* R1 ← &number_read */ LDR x1, [x1] /* R1 ← * R1 */ LDR x2, Address_of_number_D /* R1 ← &number_read */ LDR x2, [x2] /* R1 ← * R1 */ udiv x3, x1, X2 msub x4,x3,x2,x1 mov x2,x4// remainder to x2 mov x1,x3// bl printf /* call to printf */ LDR x0, 0 /* r0 ← *r0 */ mov x8, 93 svc 0 address_of_message1 : .dword message1 address_of_message2 : .dword message2 address_of_message3 : .dword message3 address_of_scan_pattern : .dword scan_pattern address_of_number_N: .dword number_N address_of_number_D : .dword number_D address_of_return : .dword return /* External */ .global printf .global scanfCopy the code

as -g -o div.o div.s

ld -o div  div.o -lc -I /lib64/ld-linux-aarch64.so.1

Here the remainder gets, and there is a direct register save, so gets via the MSUB instruction:

MSUB Wd, Wn, Wm, Wa Multiply-Subtract: Wd = Wa – (Wn × Wm)

The test runs are as follows:

$./div

Hey, type a number: 242 3

I read the number 242/3

Result, Q: 80,R: 2

 

Floating point division

FDIV Sd, Sn, Sm

Single-precision floating-point scalar division: Sd = Sn / Sm.

FDIV Dd, Dn, Dm

Double-precision floating-point scalar division: Dd = Dn / Dm.

FDIV Vd.<T>, Vn.<T>, Vm.<T>

Floating-point divide (vector). Where <T> is 2S, 4S or 2D.

64 bit floating point division example

.data .balign 4 message1: .asciz "Hey, type a number: " .balign 4 message2: .asciz "I read the number %5.2f/%5.2f\n". Balign 4 message3:.asciz "Result, Q: %5.2f\n" /* Format pattern for scanf */. Balign 4 scan_pattern: .asciz "%f %f" /* Where scanf will store the number read */ .balign 4 number_N: .dword 0 number_D: .dword 0 .balign 4 return: .word 0 .arch armv8-a .global _start .text _start: LDR x0, address_of_message1 /* R0 ← &message1 */ bl printf /* call to printf */ LDR x0, Address_of_scan_pattern /* r0 ← &scan_pattern */ LDR x1, address_of_number_N /* R1 ← &number_read */ / LDR d0, Address_of_number_N LDR x2, address_of_number_D /* R1 ← &number_read */ / LDR d1, Address_of_number_D bl scanf /* Call to scanf */ LDR x0, address_of_message2 /* R0 ← &message2 */ LDR x1, Address_of_number_N /* R1 ← &number_read */ LDR s0, [x1] /* R1 ← * R1 */ FCVT d0,s0 LDR x2, Address_of_number_D /* R1 ← &number_read */ LDR s1, [x2] /* R1 ← * R1 */ FCVT d1,s1 bl printf /* call to printf */ LDR x0, Address_of_message3 /* R0 ← &number_read */ LDR X1, address_of_number_N /* R1 ← &number_read */ LDR s0, [x1] /* R1 ← * R1 */ FCVT d0, S0 LDR x2, address_of_number_D /* R1 ← &number_read */ LDR s1, [x2] /* R1 ← * R1 */ FCVT d1,s1 fdiv d3, d0, d1 fmov d0,d3 0 /* R0 ← *r0 */ mov X8, 93 SVC 0 address_of_message1:. Dword address_of_message2:. .dword message2 address_of_message3 : .dword message3 address_of_scan_pattern : .dword scan_pattern address_of_number_N: .dword number_N address_of_number_D : .dword number_D address_of_return : .dword return /* External */ .global printf .global scanfCopy the code

as -g -o fdiv.o fdiv.s

ld -o fdiv  fdiv.o -lc -I /lib64/ld-linux-aarch64.so.1

Execute as follows:

$./fdiv

Hey, type a number: 3.5 1.2

I read the number 3.50/1.20

The Result, Q: 2.92

 

Extension instructions

The format is: KXTW, where K is the integer type we want to expand and w is the width of the narrow value. For the former, integers can be of type U (unsigned) or S (signed, that is, two complements). For the latter, the width can be B, H, or W, representing bytes (the lowest 8 bits of the register), halfwords (the lowest 16 bits of the register), or words (the lowest 32 bits of the register).

Extended instructions uxTB, SXTB, UxTH, SXTH, UXTW, SXTW.

The Add instruction

X0 ← X1 + ExtendSigned32To64 (w2)

When using these extension operators, some context must be considered. For example, the following two instructions have slightly different meanings:

X0 ← X1 + ExtendSigned8To64 (w2)

W0 ← W1 + ExtendSigned8To32 (W2)

Add x0 // x0 + (ExtendSigned16To64 (x1) << 1)

                          // this sets x2 to 0x2468

Add x0 // x0 + (ExtendSigned16To64 (x1) << 2)

                          // this sets x2 to 0x48d0

Add x0 // x0 + (ExtendSigned16To64 (x1) << 3)

                          // this sets x2 to 0x91a0

X0 + (ExtendSigned16To64 (x1) << 4)

                          // this sets x2 to 0x12340

 

The system calls

The system call instructions are SWI and SVC.

SWI and SVC are the same thing, just a name change. Previously, SVC instructions were called SWIS, software interrupts.