Saturday, September 10, 2011

Android ARM Assembly: Arithmetic and Logical Operations (Part 6)

This is part six in a series on learning ARM assembly on Android. This part covers arithmetic and logic expressions.

Part 1: Motivation and device set up
Part 2: A walk-through of a simple ARM assembly program
Part 3: Registers, memory, and addressing modes
Part 4: Gnu tools for assembly; GCC and GDB
Part 5: Stack and Functions
=> Part 6: Arithmetic and Logical Expressions
Part 7: Conditional Execution
Part 8: Assembly in Android code

The articles follow in series, each article builds on the previous.

We covered a lot of data moving in the previous articles. Let's see what processing we can do.

The ARM processors are packed with the usual arithmetic operators. If you have experience with some other processor, you might find the possible operations somewhat limited. This follows the Reduced Instruction Set principles: there are a few basic primitives and you are required to produce everything using these. So you lack complex instructions, but make up for it with faster code. More importantly, you make up for it with code that runs quiet, cool, and uses minimal battery life.

All arithmetic instructions look identical. Remember the MOV instructions?
  1. An immediate value. Ex.
    mov	r0, #3
  2. A register. Ex:
    mov	r0, r1
  3. A register with an offset. Ex:
    mov	r0, r1, #4
  4. A register with a shift. Ex:
    mov	r0, r1, lsr #3
  5. A register with a register specifying the shift. Ex:
    mov	r0, r1, lsr r2
The first part is called the destination register, since that's where the value is placed. The second part is called the shifter_operand, as a short-hand for one of many possibilities. In other words, MOV instructions have this structure:
MOV Rd, shifter_operand. This causes: Rd = shifter_operand

Arithmetic instructions follow the same structure. Instead of having one register, they have two registers. This is the general structure of arithmetic instructions:
OPERATION Rd, Rn, shifter_operand

OPERATION can be one of:
ADD (Addition)Rd = Rn + shifter_operand
SUB: (Subtraction) Rd = Rn - shifter_operand
RSB: (Reverse Subtraction) Rd = shifter_operand - Rn
ADC: (Add with Carry) Rd = Rn + shifter_operand + Carry
SBC: (Subtract with Carry) Rd = Rn - shifter_operand - Carry
RSC: (Reverse Subtract with Carry) Rd = shifter_operand - Rn - Carry
AND: (Logical And) Rd = Rn & shifter_operand
BIC: (Logical Bit Clear)Rd = Rn & ! shifter_operand
ORR: (Logical Or) Rd = Rn | shifter_operand
EOR: (Logical Exclusive Or)Rd = Rn ^ shifter_operand

The instructions are regular and follow the same pattern. The only unexplained thing is the Carry.

Carry is a bit in the status register CPSR that is set when an addition overflows past 32 bits. This could happen when adding 1 to 0xff ff ff ff. The resulting number is 1 << 32, which cannot be represented in 32 bits. It needs 33 bits. In such a case, the carry bit will be set. Addition with carry allows you to automatically add the carry bit.

The carry bit is also set when subtraction produces underflow: If you subtract INT_MAX from 0, the resulting number cannot be expressed in 2's complement because it needs to borrow a bit. In such a case, the carry is set again.

Multiplications only take a register argument. They don't take a shifter_operand.

MLA Rd, Rm, Rs, Rn Multiply with Accummulate Rd = Rm * Rs + Rn
MUL Rd, Rm, Rs Multiply Rd = Rm * Rs
SMLAL Rd32, Rd64, Rm, Rs Signed Multiply with Accummulate LongRd64,Rd32 += Rm * Rs
SMULL Rd32, Rd64, Rm, Rs Signed Multiply Long Rd64,Rd32 = Rm * Rs
UMLAL Rd32, Rd64, Rm, Rs Unsigned Multiply with Accummulate Long Rd64,Rd32 += Rm * Rs
UMULL Rd32, Rd64, Rm, Rs Unsigned Multiply Long Rd64,Rd32 = Rm * Rs

MLA and MUL are easy to understand. The difference between Signed and Unsigned instructions is that signed instructions use bit 31 as a sign bit, and 2's complement arithmetic. If you know you have unsigned numbers, using the unsigned variants allows you to express higher numbers.

The Long version of the instructions stores the lower 32 bits of the result in Rd32 and the higher bits in Rd64. MLA and MUL store only the lower 32 bits of the result, so if you multiply two very large numbers, you will get incorrect results.

A multiply is slower than using the barrel shifter. If you need to multiply by 4, you are much better off supplying "lsl #2".

Arithmetic and logical instructions are very easy in ARM processors. I have covered the instructions available in v5 of ARM processors. These instructions are guaranteed to be available nearly everywhere. v6 adds a few instructions, but the regular structure means that once you understand the v5 set, you can immediately learn the v6 additions.

You can try the following to test your knowledge till now:
  1. Write a function to input an integer y and return y * (y-1). Check by calling it inside main() and using the program from Part 5 to print the value.
  2. Test out the behaviour of BIC using a program and different inputs. Why is it called Bit Clear?
As before, the ARM Reference Manual has more information on all operations.