Monday, September 12, 2011

Android ARM Assembly: Conditional execution (Part 7)

This is part seven in a series on learning ARM assembly on Android. This part covers conditional execution.

Part 1: Motivation and device set up
Part 2: A walk-through of a simple ARM assembly program
Part 3: Registers, memory, and addressing modes
Part 4: Gnu tools for assembly; GCC and GDB
Part 5: Stack and Functions
Part 6: Arithmetic and Logical Expressions
=> Part 7: Conditional Execution
Part 8: Assembly in Android code

The articles follow in series, each article builds on the previous.

Compare and Jump
The previous lessons touched on a variety of microprocessor instructions. Let's move to conditional execution. These are the basis of all the if, while, for loops in programming languages.

Conditions are implemented in microprocessors using the Status Register CPSR. The various bits on the status register are set using instructions. Then, you can jump to a different position in the code if the specific bit is set or unset. While this scheme looks primitive, it forms the basis of all conditional execution in every language on computers. Let's see the status register again.

The conditions in the status register are:
  1. Negative: the result was negative (bit 31 of the result was set)
  2. Zero: the result was zero (all bits of the result were unset)
  3. Carry: integer addition, subtraction or shifts produce a carry or borrow (result bits could be anything)
  4. oVerflow: there was carry or borrow during signed addition or subtraction (result bits could be anything)
The status register has other bits relating to the different processor modes, but we can ignore them for now. The bits in the CPSR are set on four instructions: CMP, CMN, TEQ, TST. Let's see these instructions.

CMP Rn, shifter_operand

This performs Rn - shifter_operand, throws away the result and updates the status register. So doing a CMP R2, R2 would produce a zero output, and the Zero condition would be set. It is important to note that the result is never stored anywhere. From your perspective, you are asking the processor what the result would be like.

This is a full list of the comparison instructions.

InstructionRough IntentionThe condition flags are set based on this value
CMP Rn, shifter_operand(Addition)Rn - shifter_operand
CMN Rn, shifter_operand(Negation)Rn + shifter_operand
TST Rn, shifter_operand(Test)Rn & shifter_operand
TEQ Rn, shifter_operand(Test Equivalence)Rn ^ shifter_operand

Once the condition is set, you can use that condition to jump to a specific label. The easiest jump instruction doesn't care for flags. We saw it in part 5, where it was used to return execution back from our function.
        bx      lr
This instruction branched execution to the address contained in the Link Register. This could be done with any register, though you better be sure that the register contains a valid address. It also needs to be a 32-bit aligned address. All ARM instructions are 32-bit (word) aligned. This isn't Intel, every instruction is exactly 32 bits long. BX is a Branch, while BL is a Branch and Link.

Branch just jumps to the address specified in the register.

Branch and Link stores the address of the existing Program Counter in the Link Register, in case you want to jump back. This is something you do all the time, so there is one instruction to handle it.

ARM and Thumb Instructions
Most new ARM processors support two different instruction sets: ARM and Thumb. Thumb instructions are a subset of ARM instructions and are 16 bits each. Going back and forth between ARM and Thumb is possible, though it should be done correctly. The branch instructions ending in "X" do this. A "BX" instruction allows you to switch between ARM and Thumb while the "B" instruction doesn't. In general, you should use "BX" and "BLX" when you are unsure of the resulting code. If you are certain, you can use "B" for branch, and "BL" for branch and link.

Condition Codes
How about adding the actual condition? The condition goes as a suffix to the instruction. Say you only want to check if the Zero condition was true. Then, the condition is called EQual, with mnemonic EQ. So this operation only branches when the zero condition is set:
BLEQ lr

The full list of conditions is:
SuffixMeaningCondition tested
EQEqualZ == 1
NENot EqualZ == 0
CS or HSUnsigned Higher or Same (Carry Set)C == 1
CC or LOUnsigned Lower (Carry Clear)C == 0
MIMInusN == 1
PLPLus or ZeroN == 0
VSOverflow (V Set)V == 1
VCNo Overflow (V Clear)V == 0
HIUnsigned HigherC == 1 and Z == 0
LSUnsigned Lower or SameC == 0 or Z == 1
GESigned Greater Than or EqualN == V
LTSigned Lesser Than N != V
GTSigned Greater Than Z == 0 and N == V
LESigned Less Than or Equal ToZ==1 or N != V
ALAlways

So you could have an instruction BXNE r3, which will branch to r3, only if the previous comparison did not set the Zero condition code.

Conditional Execution
Let's look at a single if-then loop to see how to run a simple conditional in Assembly.
 1         .section        .data
 2         .align 2
 3 higher:
 4         .asciz "Yes, r2 is higher than or equal to r1\n"
 5 lower:
 6         .asciz "No, r2 is lower than r1\n"
 7 
 8         .text
 9         .align  2
10         .global main
11         .type   main, %function
12 main:
13         stmfd   sp!, {fp, lr}
14 
15         @ Load some values
16         mov     r1, #32
17         mov     r2, #33
18 
19         @ Check if r2 is lower than r1
20         cmp     r2, r1
21 
22         @ If it is greater or equal, jump ahead
23         bge     greaterOrEqual
24 
25         @ Otherwise it was lower
26         ldr     r0, =lower
27         @ Now skip past to the common point again
28         b       common
29 
30 greaterOrEqual:
31         ldr     r0, =higher
32 
33 common:
34         @ Print the message
35         bl      puts
36 
37         @ Return 0
38         mov     r0, #0
39         ldmfd   sp!, {fp, lr}
40         bx      lr

This program loads values in r2 and r3 and then compares the two. If (r2-r1) is greater than or equal to 0, then we skip to the greaterOrEqual label. Since there are only one kind of instructions (ARM instructions), we don't bother with the "X" variant of the branch. At the greaterOrEqual label, we move the string corresponding to the higher message. Otherwise, we move the string corresponding to the lower message into r0. Either case, we want to get to the common code, which prints the message, and returns 0. If we didn't return to the common case, it would continue running instructions linearly, and load r0 with the higher message. Then, we return 0, like every well behaved main() function should.

The Program Counter holds the address that will be executed next. Since it is also r15, you can directly modify it. This is a terrible idea in most cases, so you should have a good reason to do this. In theory, you could implement logic by conditionally moving values into the Program Counter. For example, you could add 8 to the current value in the PC to skip an instruction. In practice, you will not do this because it is prone to errors. Such logic fails when the program is modified, and the address of instructions changes.


Every Instruction is Conditional
Now that you know what conditional execution looks like, you are ready for a real twist. In ARM architectures, every instruction is conditional. So the ADD, SUB, LDR, STR instructions can all have these condition codes appended to them. We could write instructions containing very few BX or BLX calls.

In practice, however, you should use conditional operations only for a single instruction or two. If you find that a lot of operations act on the same condition, you should branch and skip all these operations. This is much more efficient, since ARM processors fetch more than one instruction and act upon parts of it in parallel. If you can start to execute the next few instructions, your code is considerably faster. So as a rule, use conditional operations only for two instructions or fewer. If you have a longer list of conditional operations that depend on the same condition, pull them out in a separate routine.

In addition to every instruction being conditional, every instruction can set the CPSR flags. For this, you need to append the "S" flag (for Status) to every instruction. Here is what a MOV instruction really looks like:
MOV{condition_code}{S} Rd, shifter_operand
This allows you to set the status flag without performing a CMP, saving another instruction. The status flag can be set on most operations you have seen till now. So you can automatically set the condition codes when performing arithmetic operations.

Exercises
Try your hand at these to see if you understand branches.
  1. What does the MOVGE r0, r2 instruction do?
  2. Rewrite the program to use just one branch rather than two.
  3. Rewrite the program using conditional move to avoid a branch. In this case, a conditional move is perhaps cleaner since the condition only changes one instruction.
  4. Write a function isZero(n) to accept an unsigned number n and to return 1 if n is equal to 0 and n otherwise.
  5. Improve on isZero() to write almostFactorial(n) which returns n*(n-1) if n is not zero, and 1 otherwise.
  6. Try your hand at turning almostFactorial(n) to factorial(n) which returns n! for small values of n. As n gets large, you run out of bits, so you will need to test your function with small inputs.
  7. Write a simple while { .. } loop in C and let GCC compile it to assembly. See if you can make sense of the code. GCC can compile code with optimisations turned on with the "-O2" flag. Try to read the assembly output with and without optimisations. See if you can figure out the optimisation logic.
This marks the end of all the functional pieces of ARM architecture. You should be able to read the disassembly of nearly any ARM program now. Isn't that impressive? Look up unknown instructions in the reference manual. As always, the ARM Reference Manual has more information on all condition codes, conditional execution and instructions that accept conditions and can set the flags.

The final piece will be to integrate this with Android. We will see how to call assembly code from Android to be able to run fast processing directly in assembly.