Friday, September 09, 2011

Android ARM Assembly: Stack and Functions (Part 5)

This is part five in a series on learning ARM assembly on Android. This part covers the stack and function calling.

Part 1: Motivation and device set up
Part 2: A walk-through of a simple ARM assembly program
Part 3: Registers, memory, and addressing modes
Part 4: Gnu tools for assembly; GCC and GDB
=> Part 5: Stack and Functions
Part 6: Arithmetic and Logical Expressions
Part 7: Conditional Execution
Part 8: Assembly in Android code

The articles follow in series, each article builds on the previous.

Stack
The stack is a linear data structure in memory that holds function-specific information. When a function call is made, the stack is grown, and we place local variables on the stack. When a function returns, we shrink the stack, and the local variables are lost. The stack is a temporary structure. For permanent non-constant variables, we need to allocate space on the heap. The heap is an operating system (OS) concept that we'll cover later. It is an OS concept because the processor is unaware of its presence. It is a programming paradigm rather than a facility provided by the silicon on the chip. The stack, on the other hand, is a processor-level concept. Life is a lot simpler if the processor supports stack primitives.

The stack grows up: to allocate space on the stack, you subtract multiples of 4 from the Stack Pointer. If you need 5 variables, you need to subtract 20 from the Stack Pointer. (In many documents, the memory is laid out from highest address to the lowest address. In such documents, the stack is said to grow downwards. Unfortunately, this is a standard convention even though humans think of stacks growing upward, so I explain everything with the stack growing upward.)

If you write memory as a linear list of addresses, you can see why this makes sense. Here is the memory laid out from top to bottom as a linear list going from the lowest address to the highest address. All addresses and data are in hexadecimal.
Data Location
ab 0000
cd 0004
de 0008
xx ...
f0 0080
f0 0084
f2 0088
f3 008c
f4 0090 <- SP
xx ...
ff ffff
The Stack Pointer is pointing at location 90, which means that it contains the value 0x90. The value at the top of the stack is f4, and this is the contents of memory location 0x90.

If we want to allocate two variables, we need to subtract 4 * 2 = 8 from the stack pointer. Say we store values 99 and AA at these locations, then the stack looks like this:
Data Location
ab 0000
cd 0004
de 0008
xx ...
f0 0080
f0 0084
99 0088 <- SP
AA 008c
f4 0090
xx ...
ff ffff
We ended up clobbering the existing values at those locations. To deallocate space from the stack, you add to the SP. Every allocation on the stack should be matched by a de-allocation. Otherwise you get out of sync, and read the wrong value. Functions start out by storing all the local registers on the stack. Then, on top of the stack, they allocate function-specific data.

Data Location
ab 0000
cd 0004
de 0008
xx ...
current function's stack 0080 <- SP
SP 0084
LR 0088
PC 008c
previous function's stack 0090
xx ...
ff ffff
As a result, the new function has space to grow, and when it returns, the previous variable values are moved from the stack to the registers. From the perspective of the calling function, nothing has changed.

Writing Functions
Let's look at a program to see how variables are stored on the stack. I'll use a long program, because we need a main function and another function that we can change to our liking. The main() function has a fixed signature, so we need something different.
 1         .text
 2         .align  2
 3         .global multiplyByTen
 4         .type   multiplyByTen, %function
 5 multiplyByTen:
 6         stmfd   sp!, {fp, ip, lr}
 7         mov     r3, r0, asl #3
 8         add     r0, r3, r0, asl #1
 9         ldmfd   sp!, {fp, ip, lr}
10         bx      lr
11         .size   multiplyByTen, .-multiplyByTen
12 
13         .section        .rodata
14         .align  2
15 .LC0:
16         .ascii  "The number is %d\012\000"
17 
18         .text
19         .align  2
20         .global main
21         .type   main, %function
22 main:
23         stmfd   sp!, {fp, lr}
24 
25         mov     r0, #32
26         bl      multiplyByTen
27 
28         mov     r1, r0
29         ldr     r0, .L3
30         bl      printf
31 
32         mov     r0, #0
33         ldmfd   sp!, {fp, lr}
34         bx      lr
35 
36         .align  2
37 .L3:
38         .word   .LC0
39         .size   main, .-main
The program is long, but you should be able to understand most of it by now.
This is the function we made:
 5 multiplyByTen:
 6         stmfd   sp!, {fp, ip, lr}
 7         mov     r3, r0, asl #3
 8         add     r0, r3, r0, asl #1
 9         ldmfd   sp!, {fp, ip, lr}
10         bx      lr
This function returns the input multiplied by 10 through its creative uses of shifts. Multiplying y by 10 is the same as (y << 3 + y << 1) = (8y + 2y). The function is made to illustrate how function calling works. The first four inputs to a function are kept in r0-r3. Here we have just one input, so it is in r0. Line 7 moves 8*r0 to r3, and line 8 adds 2*r0 to this, and stores it back in r0. The return value of the function is kept in r0. However, we also have line 6 and line 9. What are they doing?
 6         stmfd   sp!, {fp, ip, lr}
This is a multi-register move operation. This moves the registers FP,IP,LR into the area specified by the register SP. Since SP is the stack pointer, this is the same as pushing registers FP,IP,LR to the top of the stack in a single operation. FP is another name for r11. Once this is done, the stack pointer is updated since it has an exclamation mark. The order of specifying the registers doesn't matter. ARM processors have a bitmask to specify which registers are written, and they are always written in the same order, irrespective of the order you specify them in. GCC might complain if you specify them in the wrong order, though. STM is the base operation, there are variants depending on how you want the stack to grow. STM is STore Multiple, and it comes in variants like IA: Increment After, IB: Increment Before, DA: Decrement After, DB: Decrement Before. These describe when the stack pointer is updated (before writing to memory, or after), and whether it is incremented or decremented. These operations also have handy mnemonics describing their use for operating with stacks. Linux uses a Full Descending type of a stack. Full because the stack pointer points to the last used location, and Descending because the stack grows downward, when memory listed from highest address to lowest address. If you use the FD versions of these operations, your code will interoperate well with other libraries.
 9         ldmfd   sp!, {fp, ip, lr}
This is another multi-register move that undoes the action on line 6. This reads back the values that were written earlier, popping them from the stack. The combined effect of lines 6 and 9 is to store the important register values on the stack and then restore them. This allows the function to modify them in the main body. As with STMFD, the values are specified in a bitmask, so they are always read in the same order.

The function semantics require that r0-r3 are the only registers that can be clobbered during a function call. If you need more than four registers in a function, you need to include them in the STMFD and LDMFD lines to ensure they are restored before exit. If you need a lot of local variables in a function, you need to keep the rest in memory using load and store commands. You can safely keep variables past the stack, in locations like SP-4, SP-8. When your function returns, the stack pointer is restored and the next function can reuse this storage. You can either grow the stack by subtracting multiples of 4 from it, or you can refer to offsets from the stack. The ARM Procedure call Standard mentions that r4-r8, r10, r11, and the SP need to be preserved in a subroutine. So if you are modifying these registers, ensure that you store them at the start and load them back before the end of your routine.

If you need to pass more than four input values to a function, you need to store them on the stack. The function needs to read the exact number of values from the stack. Finally, if you need to return more than one value, you need to modify values in the stack, or return them in r0-r3, and hope your code is never called by anyone else's code.

Exercises
Try experimenting with the following:
  1. Break the function at various points with GDB and examine the memory starting at the Stack Pointer. Does the stack pointer point to the last variable on the stack, or past the last variable on the stack?
  2. Try removing the STMFD and LDMFD in the multiplyByTen function. Does the function still work?
  3. Change multiplyByTen to return the value of SP. Print it using '%p' in the printf. Does it match what GDB says?
  4. Change the multiplyByTen to modify IP, LR and FP in the function and assert that they are indeed restored. Verify the modification and the restoring through GDB.
  5. Rewrite the function to use LDR and STR rather than multi-register moves. You will need to modify the stack pointer with each load and store.
The ARM Reference Manual has more information on mutiple loads and stores, including information about the IA,IB,DA,DB, and FD versions, and what they mean.