Monday, September 05, 2011

Android ARM Assembly: A trivial program (Part 2)

This is part two of a series on learning ARM assembly on Android. This part covers a walkthrough of a ARM assembly program.

Part 1: Motivation and device set up
=> Part 2: A walk-through of a simple ARM assembly program
Part 3: Registers, memory, and addressing modes
Part 4: Gnu tools for assembly; GCC and GDB
Part 5: Stack and Functions
Part 6: Arithmetic and Logical Expressions
Part 7: Conditional Execution
Part 8: Assembly in Android code

The articles follow in series, each article builds on the previous.

Hello world assembly program

The easiest program that most languages introduce is the Hello World program. If you had followed with Part one, you typed the Hello World program in C. Now generate the assembly language for that example using the command given at the end of part one.

Here is the assembly language program in full:
 1 	.cpu arm9tdmi
 2 	.fpu softvfp
 3 	.eabi_attribute 20, 1
 4 	.eabi_attribute 21, 1
 5 	.eabi_attribute 23, 3
 6 	.eabi_attribute 24, 1
 7 	.eabi_attribute 25, 1
 8 	.eabi_attribute 26, 2
 9 	.eabi_attribute 30, 6
10 	.eabi_attribute 18, 4
11 	.file	"hello.c"
12 	.section	.rodata
13 	.align	2
14 .LC0:
15 	.ascii	"Hello World\000"
16 	.text
17 	.align	2
18 	.global	main
19 	.type	main, %function
20 main:
21 	@ Function supports interworking.
22 	@ args = 0, pretend = 0, frame = 8
23 	@ frame_needed = 1, uses_anonymous_args = 0
24 	mov	ip, sp
25 	stmfd	sp!, {fp, ip, lr, pc}
26 	sub	fp, ip, #4
27 	sub	sp, sp, #8
28 	str	r0, [fp, #-16]
29 	str	r1, [fp, #-20]
30 	ldr	r0, .L3
31 	bl	puts
32 	mov	r3, #0
33 	mov	r0, r3
34 	sub	sp, fp, #12
35 	ldmfd	sp, {fp, sp, lr}
36 	bx	lr
37 .L4:
38 	.align	2
39 .L3:
40 	.word	.LC0
41 	.size	main, .-main
42 	.ident	"GCC: (Debian 4.3.2-1.1) 4.3.2"
43 	.section	.note.GNU-stack,"",%progbits
As you can see, it is quite a lot of code for a simple program. Luckily, most of it is boilerplate.  Let's break it down piece by piece.

Declaration and options
 1 	.cpu arm9tdmi
 2 	.fpu softvfp
 3 	.eabi_attribute 20, 1
 4 	.eabi_attribute 21, 1
 5 	.eabi_attribute 23, 3
 6 	.eabi_attribute 24, 1
 7 	.eabi_attribute 25, 1
 8 	.eabi_attribute 26, 2
 9 	.eabi_attribute 30, 6
10 	.eabi_attribute 18, 4
11 	.file	"hello.c"
The first eleven lines are declarations of various options in the ARM cpu. You can ignore them for now. In case you are curious, we specify the CPU type, the way we want the Floating Point Unit (FPU) to operate, and then specify options for the ARM Embedded Application Binary Interface (EABI). The filename is specified on line 11.

Declaring constants
12 	.section	.rodata
13 	.align	2
14 .LC0:
15 	.ascii	"Hello World\000"
The string "Hello World" is specified as a constant in assembly on lines 12-15. It is in the Read Only DATA section (.section .rodata), it needs to be aligned on two-byte boundaries, and the string is specified as an ASCII string. Byte alignment is very important in assembly language programming.  You must pay careful attention to data that needs to be word aligned (2-byte), quad aligned (4-byte) or byte aligned (1-byte). In general, you can't go wrong with word alignment, so if you are uncertain, add a .align 2 at the top of data and functions.
The data is specified as a string in assembly, though the assembler writes it out as bytes behind the scenes.

Declaring functions
16 	.text
17 	.align	2
18 	.global	main
19 	.type	main, %function
20 main:
The program consists of a single function called main. Program code is always in the text section, thus the declaration on line 16. It is word aligned (line 17). main is a global variable, and line 18 allows it to be visible elsewhere in the program. Finally, it is listed as a function, and line 20 is a label containing the name 'main'.

Registers and data
Before we look at the function contents, it might be useful to know what the ARM architecture is like. ARM processors are an example of Reduced Instruction Set Computing (RISC). This means that there are few instructions, and most instructions operate on registers. For user programs, there are 16 registers, called r0, r1, r2, .., r15. Each register is 32 bits long. Registers correspond roughly to variables, though they don't have a data type. Most arithmetical and logical operations are performed on registers. The processor can also access the entire memory using load and store instructions.

The registers r12-r15 are special:
r12: IP, or Intra-Procedure call stack register. This register is used by the linker as a scratch register between procedure calls. A procedure must not modify its value on return. This register isn't used by Linux gcc or glibc, but another system might.
r13: SP, or Stack Pointer. This register points to the top of the stack. The stack is area of memory used for local function-specific storage. This storage is reclaimed when the function returns. To allocate space on the stack, we subtract from the stack register. To allocate one 32-bit value, we subtract 4 from the stack pointer.
r14: LR, or Link Register. This register holds the return value of a subroutine. When a subroutine is called, the LR is filled with the program counter.
r15: PC, or Program Counter. This register holds the address of memory that is currently being executed.

Here are some data move instructions:
24 	mov	ip, sp
This moves the value from sp (r13) to ip (r12). This achieves ip = sp.
32 	mov	r3, #0
This moves the value 0 into r3. This achieves r3 = 0
 In addition to moving values within registers, you can load values from memory into registers, and store registers into memory. Here are instructions that achieve this:
28 	str	r0, [fp, #-16]
This stores the contents of r0 into the memory location pointed to by (fp -16). Since memory is addressed by bytes, and registers are 4 bytes each, memory offsets are often multiples of 4. This is the same as the C statement *(fp - 4) = r0
30 	ldr	r0, [fp]
This loads the data from memory pointed to by register fp into register r0. This is the same as the C statement r0 = *(fp)
25 	stmfd	sp!, {fp, ip, lr, pc}
This is a multi-register move operation. This moves the registers FP,IP,LR,PC into the area specified by the register SP. Since SP is the stack pointer, this is the same as pushing registers FP,IP,LR,PC to the top of the stack in a single operation. Once this is done, the stack pointer is updated since it has an exclamation mark.
35 	ldmfd	sp, {fp, sp, lr}
This is another multi-register move that undoes the action on line 25. This reads back the values that were written earlier, popping them from the stack. The combined effect of lines 25 and 35 is to store the important register values on the stack and then restore them. This allows the function to modify them in the main body. We don't restore IP

Manipulating data
Assembly instructions to manipulate data are very basic: you can do basic arithmetic operations: ADD, SUB, and basic logical operations: AND, OR. ARM Assembly language operations are of the type: OPERATION ARG1, ARG2, ARG3 This performs the operation ARG1 = ARG2 OPERATION ARG3. In the above program, you see some basic arithmetic.

26 	sub	fp, ip, #4
This performs the action fp = ip - 4.

Function calls and returns
Assembly language is written as a flat set of instructions: there is very little structure once the instructions are written to memory in the computer. In order to make code modular, functions can be written. Without the protection of the C compiler, assembly programs must manage their own function calling.
The basic function call involves the following structure:
 	.text
 	.align	2
 	.global	functionName
 	.type	functionName, %function
 functionName:
 	mov	ip, sp
 	stmfd	sp!, {fp, ip, lr, pc}
 	sub	fp, ip, #4  @ Space for local variables
 	sub	sp, sp, #8
 
 	sub	sp, fp, #12
 	ldmfd	sp, {fp, sp, lr}
 	bx	lr
Subroutines are required to preserve every register except for r0-r3. So if you need to use the other registers (r4 onwards), you should save them on the stack before over-writing them. The exact function call convention is listed in the ARM procedure call standard.
In order to call a function, you Branch and Link using the BL instruction. The return address is placed in the LR register. To return from a function, you call a branch on the LR register. This is a BX rather than a B to correctly move between ARM and Thumb instructions. (Ignore ARM and Thumb differences for now, they will be made clear later).

Hello World functionality
Put together, the lines 30-33 lines print out the message Hello ARM world. Let's break this down instruction-by-instruction to see how this is achieved.
30 	ldr	r0, .L3
31 	bl	puts
32 	mov	r3, #0
33 	mov	r0, r3
Line 30 loads the address of label .L3 into register r0. The function calling convention is that the first four arguments are stored in r0-r3, and subsequent arguments are stored on the stack. The function to put a value on the screen is called puts, and it accepts just one argument: the string to be printed. The address of this string is stored in the first register: r0.
Line 31 calls the puts function, which consumes r0 and prints the value on the screen. After calling the function, we can expect the registers r0-r3 to be trashed. The return value of the puts is in r0, but we don't care for it.

Line 32 and 33 put together achieve r0 = 0. This is the return value that the main method returns.

Final word
You are now capable of reading ARM assembly and understanding the main elements in the program. As an exercise, you could try reducing the size of the code while keeping the functionality intact.

 In the next article, we can examine each piece in some detail.