Wednesday, September 07, 2011

Android ARM Assembly: Registers, Memory and Addressing Modes (Part 3)

This is part three of a series on learning ARM assembly on Android. This part covers registers, memory and addressing modes.

Part 1: Motivation and device set up
Part 2: A walk-through of a simple ARM assembly program
=> Part 3: Registers, memory, and addressing modes
Part 4: Gnu tools for assembly; GCC and GDB
Part 5: Stack and Functions
Part 6: Arithmetic and Logical Expressions
Part 7: Conditional Execution
Part 8: Assembly in Android code

The articles follow in series, each article builds on the previous.

Registers

On ARM processors you have 16 registers. Actually, that is not entirely true. ARM processors have 32 registers, each 32 bits wide. ARM processors have different programming modes to distinguish user-level and system-level access. Only some registers are visible in each mode. In the user-level mode you can access 16 registers. This is the mode you will use most frequently, so you can ignore the entire mode stuff for now. By the time you need to write Linux device drivers, you will be well past this introduction.

The registers are called r0-r15, and the last four are special.
r12: IP, or Intra-Procedure call stack register. This register is used by the linker as a scratch register between procedure calls. A procedure must not modify its value on return. This register isn't used by Linux gcc or glibc, but another system might.
r13: SP, or Stack Pointer. This register points to the top of the stack. The stack is area of memory used for local function-specific storage. This storage is reclaimed when the function returns. To allocate space on the stack, we subtract from the stack register. To allocate one 32-bit value, we subtract 4 from the stack pointer.
r14: LR, or Link Register. This register holds the return value of a subroutine. When a subroutine is called, the LR is filled with the program counter.
r15: PC, or Program Counter. This register holds the address of memory that is currently being executed.
There is one more register, the Current Program Status Register (CPSR) that contains values indicating some flags like Negative, Zero, Carry, etc. We'll visit it later, you can't read and write it like a normal register anyway.

In assembly language, these are all the variables you have access to. If you need to perform any computation, they need to operate upon these registers. Since there are a small number of registers, you need to be judicious in their use. A lot of optimisation boils down to being miserly with register allocation.

Here are all the data move instructions:
	mov	r0, r1
This moves the value from r1 to r0. This achieves r0 = r1
 	mov	r0, r1, lsl #2
Logical Shift Left (lsl) variable r1 by 2 bits, and then move to r0. This achieves r0 = r1 << 2. There is a limitation on how much rotation is allowed: rotation is specified using five bits, so any value of rotation between 0-31 is allowed. The effect of a single left-shift is multiplication by 2.

 	mov	r0, r1, lsr #3
Logical Shift Right (lsr) variable r1 by 3, and then move to r0. This achieves r0 = (int) (r1 >> 3) The effect of a single left-shift is (the integer part of) division by 2.
 	mov	r0, r1, lsl r2
Logical Shift Left (lsl) variable r1 by the amount given in r2, and then move to r0. The value in register r2 should be between 0 and 31. This achieves r0 = (int) (r1 << r2)
LSL and LSR are not the only shifts. There are also Arithmetic Shifts: ASL and ASR that maintain signed-ness. ASR is different from LSR for negative numbers. Negative numbers have leading bits set 1, and ASR right shifting propagates the 1 bits. In case you don't remember how negative numbers are represented, you might want to read a One's Complement review. ASL is the same as LSL because the sign extension doesn't happen when shifting left. Finally there is ROtate Right, or ROR. Rotation preserves all information, the bits are cycled like the barrel of a six-shooter. You can rotate by five bits, which means from 0-31 positions. There is no rotate left, since to rotate left by 3 bits, you can rotate right by 32-3 = 29 bits. Every shift can take either a five bit value or a register that contains a value between 0 and 31.
 	mov	r0,#10
Move the literal value 10 r0. This achieves r0 = 10. This is an example of Immediate addressing, the literal value of the constant is given in the instruction. There is a limit to what can be listed in immediate values. Any number that can be expressed in 8 bits constant, plus 5 bits in rotation is allowed. Anything other that that must be declared as a constant, and loaded from memory. Examples of valid values are 0x10, 0x100 (both using the constant alone), and 0x40000 (using rotate to left)

Loading from and Storing to Memory

The memory on the ARM architecture is laid out as a flat array. No more segment register nonsense of the Intel world. Addressing modes are equally straight forward and easy to remember.
 	ldr	r0, .L3
Move the contents of address label .L3 to r0. This achieves r0 = *(.L3)
 	ldr	r0, [r1]
Move the contents of data pointed to by r1 to r0. This achieves r0 = *(r1). This addressing mode is using a register for a base address.
 	ldr	r0, [r1, #4]
Move the contents of data pointed to by (r1 + 4) to r0. This achieves r0 = *(r1 + 1) since 32 bits are moved at a time. Byte alignment might enforce this, so you might not be able to do "ldr r0, [r1, #1]". This addressing mode is using an immediate offset value.
 	ldr	r0, [r1, r2]
Move the contents of data pointed to by (r1 + r2) to r0. This achieves r0 = *(r1 + (r2/4)). This addressing mode is using a register as an offset, in addition to a register as a bass address.
 	ldr	r0, [r1, -r2]
It is possible to specify the offsets as negative.
 	ldr	r0, [r1, r2, lsl #4]
Move the contents of data pointed to by (r1 + (r2 logical shift left by four bits)) to r0. This achieves r0 = *(r1 + ((r2 << 4)/4)). This addressing mode is using a register for base address, and a shifted register for offset.
You get the picture. The load instructions take the same barrel shift arguments as the move instructions, so LSL, LSR, ASL, ASR, and ROR are all valid. The interesting stuff happens when you have the ability to use a register as a pointer while modifying the pointer and the destination. There are two ways of doing this, post-index addressing, and pre-index addressing. These correspond roughly to the postincrement (a++) and preincrement (++a) operators in C. Here is a post-increment operator
 	ldr	r0, [r1], #4
Move the contents of data pointed to by r1 to r0, and stores the value r1+4 in r1. This achieves r0 = *(r1), r1 = r1 + 1. As before, variants where registers are used as offsets and shifted registers as offsets are valid.
 	ldr	r0, [r1], r2
 	ldr	r0, [r1], r2, asl #4

In pre-indexed addressing, we modify the base address register before loading the address from the memory to the register. This is indicated by an exclamation mark at the end of the address, indicating that it is written first.
 	ldr	r0, [r1, #4]!
Increase the content of r1 by 4, and then move the contents of data pointed to by (r1) to r0. This achieves r1 = r1 + 4, r0 = *(r1)
 	ldr	r0, [r1, r2]!
Increase the content of r1 by the contents of r2, and then move the contents of data pointed to by (r1) to r0. This achieves r1 = r1 + r2, r0 = *(r1)
 	ldr	r0, [r1, r2,  #4 ]!
You get the idea. Any registers r0-r15 can be used where r0, r1, and r2 are used in examples above.
All those examples dealt with loading contents from memory to registers. For storing to memory, we use the STR opcode instead of the LDR opcode. The same addressing modes can be used to store values from registers into memory. Here are a few examples.
 	str	r0, [r1], #4
Move the contents of r0 into memory pointed to by r1, and stores the value r1+4 in r1. This achieves *(r1) = r0, r1 = r1 + 1. As before, variants where registers are used as offsets and shifted registers as offsets are valid.
 	str	r0, [r1]
Move contents of r0 into address pointed to by register r1 *(r1) = r0
 	str	r0, [r1], r2
Move contents of r0 into address pointed to by r1, and increment r1 by contents of r2. *r1 = r0, r1 = r1 + r2

References
Register move instructions and indexing modes is perhaps the hardest part of ARM assembly. You should take solace in the fact that the addressing modes are considerably less complicated than the madness of segment registers on Intel processors.

This is a basic introduction to the registers and addressing modes in ARM processors. In case you want more depth, Jasper Vijn's excellent introduction to ARM assembly covers these topics.

The authoritative technical reference for the ARM architecture is the ARM Application Reference Manual. It contains valuable technical information on all instructions in the ARM instruction set.