Thursday, September 08, 2011

Android ARM Assembly: GCC and GDB (Part 4)

This is part four of a multi-part series. This part covers the Gnu C Compiler and the Gnu Debugger. Both are essential development tools.

Part 1: Motivation and device set up
Part 2: A walk-through of a simple ARM assembly program
Part 3: Registers, memory, and addressing modes
=> Part 4: Gnu tools for assembly; GCC and GDB
Part 5: Stack and Functions
Part 6: Arithmetic and Logical Expressions
Part 7: Conditional Execution
Part 8: Assembly in Android code

The articles follow in series, each article builds on the previous.

In case you already know how to use Gnu tools in the Intel world, they work in the same way for ARM machines.

GCC
GCC is the default compiler on Linux systems. It is a versatile compiler. We need very few options for our ARM assembly development.
$ gcc -S source.c
This is perhaps the most useful way in which we can use gcc. It creates an assembly source.s file which corresponds to the C source code. This is great for learning how gcc translates specific C constructs to assembly.
$ gcc -o hello source.c
Compile C file source.c into an executable called hello.
$ gcc -o hello source.s
Compile assembly file source.s into an executable called hello. This is probably what you will be using all along.
The full GCC manual can be downloaded online.

GAS
Gas is the GNU assembler, and is the default assembler on Linux systems. My tutorials don't call gas directly. If you invoke gas yourself, you get an object file that you need to link against glibc using the linker. Invoking gcc on assembly source code calls gas and the linker, so I prefer to do that.

Knowledge of the assembler helps when you want to use specific assembler directives. The gas manual is available online.

GDB
While programming assembly, you often need something to show you what the state of the machine is. You need to see every register and every memory location. This is where GDB comes in. It is free, it is easy to use. Here is a gentle introduction to gdb to cover most assembly needs.
$ gdb hello
Start gdb with the executable called hello. It prints a helpful message, and drops you to the (gdb) prompt. This prompt is where you type all commands.
(gdb) disassemble main
Dump of assembler code for function main:
   0x000083d0 <+0>:	push	{r11, lr}
   0x000083d4 <+4>:	add	r11, sp, #4
   0x000083d8 <+8>:	sub	sp, sp, #8
   0x000083dc <+12>:	str	r0, [r11, #-8]
   0x000083e0 <+16>:	str	r1, [r11, #-12]
   0x000083e4 <+20>:	ldr	r0, [pc, #20]	; 0x8400 
   0x000083e8 <+24>:	bl	0x82e8 
   0x000083ec <+28>:	mov	r3, #0
   0x000083f0 <+32>:	mov	r0, r3
   0x000083f4 <+36>:	sub	sp, r11, #4
   0x000083f8 <+40>:	pop	{r11, lr}
   0x000083fc <+44>:	bx	lr
   0x00008400 <+48>:	andeq	r8, r0, r8, lsl #9
End of assembler dump.
Look at the assembly source of any function. In this case, we looked through the assembly output of main, the entry point to our hello word function. There are some familiar instructions here already. Disassembly can be done for any executable. You don't need the source code for the program.
(gdb) break *0x000083e4
Breakpoint 1 at 0x83e4
This sets a breakpoint at the specified memory address. When you run the program, the execution will break at that location, and you will be dropped back on the gdb shell to inspect the state.
(gdb) run
Starting program: /home/user/ARM/hello 

Breakpoint 1, 0x000083e4 in main ()
Alright, we started the program and it broke exactly where we asked it to. This is a great time to examine the registers and the memory.
(gdb) info registers
r0             0x1	1
r1             0xbed9a924	3201935652
r2             0xbed9a92c	3201935660
r3             0x83d0	33744
r4             0x0	0
r5             0x0	0
r6             0x0	0
r7             0x0	0
r8             0x0	0
r9             0x0	0
r10            0x40025000	1073893376
r11            0xbed9a7d4	0xbed9a7d4
r12            0xbed9a840	3201935424
sp             0xbed9a7c8	0xbed9a7c8
lr             0x4003b508	1073984776
pc             0x83e4	0x83e4 
cpsr           0x60000010	1610612752
This command shows you the register state. As you can see, there are the standard registers r0-r12, and SP, LR, and PC. You can also see the status register CPSR printed in full. The function calling convention on ARM is that the first four arguments to a function are stored in r0-r3. Let's verify that this is the case.
The function we are looking at is main(int argc, char* argv[]). It has two arguments argc and argv, which should be in r0 and r1 respectively. r0 should contain argc, or the number of commandline arguments given. We invoked the program with no arguments, so the commandline arguments consist of only the program name. argc should be 1, which is what r0 contains
argv is trickier. It is a pointer to pointers containing strings. This is partly confirmed by r2, which is a large hex number: 0xbed9a924. It could be a memory location. Let's find out.
(gdb) x/w 0xbed9a924
0xbed9a924:	0xbed9aa0d
The "x/w" stands for eXamine memory/ parse as Word. Memory locations could contain anything, so we want to parse it as a 32 bit word to start out. The contents look a lot like the address itself. Let's see what the next few contents hold.
(gdb) x/12w 0xbed9a924
0xbed9a924:	0xbed9aa0d	0x00000000	0xbed9aa22	0xbed9aa32
0xbed9a934:	0xbed9aa3d	0xbed9aa47	0xbed9af37	0xbed9af43
0xbed9a944:	0xbed9af80	0xbed9af8f	0xbed9afa2	0xbed9afab
x/12w stand for eXamine memory/show me 12 Words. As you can see, all the contents of memory look like they are addresses. Let's see what is at the first address: at 0xbed9aa0d
(gdb) x/w 0xbed9aa0d
0xbed9aa0d:	0x6d6f682f
Hmm, that doesn't look like an address. This should be a string, and rather than converting the 0x6d 0x6f 0x68 ... to ascii myself, I'll let gdb help me out.
(gdb) x/s 0xbed9aa0d
0xbed9aa0d:	 "/home/user/ARM/hello"
We are asking gdb to "eXamine memory / as String". gdb knows that C strings are null terminated, so it helpfully walks over the successive memory locations, interpreting each byte as ASCII, till it comes to a null terminator. So we have verified that argv[1] is a pointer to a string, containing the program name. Let's see what the next few memory addresses hold.
(gdb) x/10s 0xbed9aa0d
0xbed9aa0d:	 "/home/user/ARM/hello"
0xbed9aa22:	 "SHELL=/bin/bash"
0xbed9aa32:	 "TERM=xterm"
0xbed9aa3d:	 "USER=user"
0xbed9aa47:	 "LS_COLORS=rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:...
0xbed9ab0f:	 ":*.taz=01;31:*.lzh=01;31:*.lzma=01;31:*.tl...
0xbed9abd7:	 "eb=01;31:*.rpm=01;31:*.jar=01;31:*.rar=0"...
0xbed9ac9f:	 ":*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.ti...
0xbed9ad67:	 "v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=0...
0xbed9ae2f:	 "yuv=01;35:*.cgm=01;35:*.emf=01;35:*.axv...
We "eXamine 10 memory locations as String", and we find that we have run past the end of argv. We are seeing the environment variables that are specified by the Bash shell, including the name of the shell, the username, and the colors for the different file types. I forgot, where were we?
(gdb) where
#0  0x000083e4 in main ()
We can find out our state in the execution by asking 'where'. Though we could just as easily have looked up the Program Counter register for this simple program.
(gdb) disassemble main
Dump of assembler code for function main:
   0x000083d0 <+0>:	push	{r11, lr}
   0x000083d4 <+4>:	add	r11, sp, #4
   0x000083d8 <+8>:	sub	sp, sp, #8
   0x000083dc <+12>:	str	r0, [r11, #-8]
   0x000083e0 <+16>:	str	r1, [r11, #-12]
=> 0x000083e4 <+20>:	ldr	r0, [pc, #20]	; 0x8400 
   0x000083e8 <+24>:	bl	0x82e8 
   0x000083ec <+28>:	mov	r3, #0
   0x000083f0 <+32>:	mov	r0, r3
   0x000083f4 <+36>:	sub	sp, r11, #4
   0x000083f8 <+40>:	pop	{r11, lr}
   0x000083fc <+44>:	bx	lr
   0x00008400 <+48>:	andeq	r8, r0, r8, lsl #9
End of assembler dump.
gdb shows a helpful arrow showing where we are. We can set another breakpoint if we like. After a long debugging session, you might forget which breakpoints you have set.
(gdb) info breakpoints 
Num     Type           Disp Enb Address    What
1       breakpoint     keep y   0x000083e4 
	breakpoint already hit 1 time
        info registers
3       breakpoint     keep y   0x000083f0 
        info registers
You can see all breakpoints with 'info breakpoints' and you can delete breakpoints with 'delete x', where x is the number of the breakpoint. When deleting breakpoints, gdb doesn't produce any output if it is successful.
(gdb) delete 3
A very helpful technique when debugging for loops is to run some commands automatically when a breakpoint is hit. This is done with the 'commands' directive as folows.
(gdb) break *0x000083ec
Breakpoint 4 at 0x83ec
(gdb) commands 4
Type commands for breakpoint(s) 4, one per line.
End with a line saying just "end".
>info registers
>end
(gdb) 
Now, when the breakpoint is hit, gdb will automatically run the 'info registers' command. Let's continue running this program so it can hit the next breakpoint.
(gdb) continue
Continuing.
Hello World

Breakpoint 4, 0x000083ec in main ()
r0             0xc	12
r1             0x0	0
r2             0x40153228	1075130920
r3             0x83d0	33744
r4             0x0	0
r5             0x0	0
r6             0x0	0
r7             0x0	0
r8             0x0	0
r9             0x0	0
r10            0x40025000	1073893376
r11            0xbed9a7d4	0xbed9a7d4
r12            0x0	0
sp             0xbed9a7c8	0xbed9a7c8
lr             0x83ec	33772
pc             0x83ec	0x83ec 
cpsr           0x60000010	1610612752
gdb ran past the puts(), and printed "Hello World" on the screen. It hit the breakpoint, and automatically showed us the registers. Great. Let's finish up by continuing.
(gdb) continue 
Continuing.
[Inferior 1 (process 17307) exited normally]
(gdb) info registers 
The program has no registers now.
The program is done. We can't examine registers or memory because it isn't running anymore.
The full GDB documentation is available online.

Links
Now that you know how to examine registers and memory, you can write ARM programs and verify that they do the right thing. You can break at various locations and verify that your load store and move instructions are working as expected.

The Gnu tools are ubiquitous and mature. Once you learn how to use gdb and gcc on ARM, you can easily use the same tricks on another platform like Intel. Here are all the manual links again:
  1. GCC manual
  2. Gas (Gnu Assembler) manual
  3. GDB manual