## Saturday, September 10, 2011

### Android ARM Assembly: Arithmetic and Logical Operations (Part 6)

This is part six in a series on learning ARM assembly on Android. This part covers arithmetic and logic expressions.

Part 1: Motivation and device set up
Part 2: A walk-through of a simple ARM assembly program
Part 3: Registers, memory, and addressing modes
Part 4: Gnu tools for assembly; GCC and GDB
Part 5: Stack and Functions
=> Part 6: Arithmetic and Logical Expressions
Part 7: Conditional Execution
Part 8: Assembly in Android code

The articles follow in series, each article builds on the previous.

We covered a lot of data moving in the previous articles. Let's see what processing we can do.

Arithmetic
The ARM processors are packed with the usual arithmetic operators. If you have experience with some other processor, you might find the possible operations somewhat limited. This follows the Reduced Instruction Set principles: there are a few basic primitives and you are required to produce everything using these. So you lack complex instructions, but make up for it with faster code. More importantly, you make up for it with code that runs quiet, cool, and uses minimal battery life.

All arithmetic instructions look identical. Remember the MOV instructions?
1. An immediate value. Ex.
`mov	r0, #3`
2. A register. Ex:
`mov	r0, r1`
3. A register with an offset. Ex:
`mov	r0, r1, #4`
4. A register with a shift. Ex:
`mov	r0, r1, lsr #3`
5. A register with a register specifying the shift. Ex:
`mov	r0, r1, lsr r2`
The first part is called the destination register, since that's where the value is placed. The second part is called the shifter_operand, as a short-hand for one of many possibilities. In other words, MOV instructions have this structure:
MOV Rd, shifter_operand. This causes: Rd = shifter_operand

Arithmetic instructions follow the same structure. Instead of having one register, they have two registers. This is the general structure of arithmetic instructions:
OPERATION Rd, Rn, shifter_operand

OPERATION can be one of:
 ADD (Addition) Rd = Rn + shifter_operand SUB: (Subtraction) Rd = Rn - shifter_operand RSB: (Reverse Subtraction) Rd = shifter_operand - Rn ADC: (Add with Carry) Rd = Rn + shifter_operand + Carry SBC: (Subtract with Carry) Rd = Rn - shifter_operand - Carry RSC: (Reverse Subtract with Carry) Rd = shifter_operand - Rn - Carry AND: (Logical And) Rd = Rn & shifter_operand BIC: (Logical Bit Clear) Rd = Rn & ! shifter_operand ORR: (Logical Or) Rd = Rn | shifter_operand EOR: (Logical Exclusive Or) Rd = Rn ^ shifter_operand

The instructions are regular and follow the same pattern. The only unexplained thing is the Carry.

Carry is a bit in the status register CPSR that is set when an addition overflows past 32 bits. This could happen when adding 1 to 0xff ff ff ff. The resulting number is 1 << 32, which cannot be represented in 32 bits. It needs 33 bits. In such a case, the carry bit will be set. Addition with carry allows you to automatically add the carry bit.

The carry bit is also set when subtraction produces underflow: If you subtract INT_MAX from 0, the resulting number cannot be expressed in 2's complement because it needs to borrow a bit. In such a case, the carry is set again.

Multiply
Multiplications only take a register argument. They don't take a shifter_operand.

 MLA Rd, Rm, Rs, Rn Multiply with Accummulate Rd = Rm * Rs + Rn MUL Rd, Rm, Rs Multiply Rd = Rm * Rs SMLAL Rd32, Rd64, Rm, Rs Signed Multiply with Accummulate Long Rd64,Rd32 += Rm * Rs SMULL Rd32, Rd64, Rm, Rs Signed Multiply Long Rd64,Rd32 = Rm * Rs UMLAL Rd32, Rd64, Rm, Rs Unsigned Multiply with Accummulate Long Rd64,Rd32 += Rm * Rs UMULL Rd32, Rd64, Rm, Rs Unsigned Multiply Long Rd64,Rd32 = Rm * Rs

MLA and MUL are easy to understand. The difference between Signed and Unsigned instructions is that signed instructions use bit 31 as a sign bit, and 2's complement arithmetic. If you know you have unsigned numbers, using the unsigned variants allows you to express higher numbers.

The Long version of the instructions stores the lower 32 bits of the result in Rd32 and the higher bits in Rd64. MLA and MUL store only the lower 32 bits of the result, so if you multiply two very large numbers, you will get incorrect results.

A multiply is slower than using the barrel shifter. If you need to multiply by 4, you are much better off supplying "lsl #2".