This is part four of a multi-part series. This part covers the Gnu C Compiler and the Gnu Debugger. Both are essential development tools.
Part 1: Motivation and device set up
Part 2: A walk-through of a simple ARM assembly program
Part 3: Registers, memory, and addressing modes
=>
Part 4: Gnu tools for assembly; GCC and GDB
Part 5: Stack and Functions
Part 6: Arithmetic and Logical Expressions
Part 7: Conditional Execution
Part 8: Assembly in Android code
The articles follow in series, each article builds on the previous.
In case you already know how to use Gnu tools in the Intel world, they
work in the same way for ARM machines.
GCC
GCC is the default compiler on Linux systems. It is a versatile compiler. We need very few options for our ARM assembly development.
$ gcc -S source.c
This is perhaps the most useful way in which we can use gcc. It creates an assembly source.s file which corresponds to the C source code. This is great for learning how gcc translates specific C constructs to assembly.
$ gcc -o hello source.c
Compile C file source.c into an executable called hello.
$ gcc -o hello source.s
Compile assembly file source.s into an executable called hello. This is probably what you will be using all along.
The full
GCC manual can be downloaded online.
GAS
Gas is the GNU assembler, and is the default assembler on Linux systems. My tutorials don't call gas directly. If you invoke gas yourself, you get an object file that you need to link against glibc using the linker. Invoking gcc on assembly source code calls gas and the linker, so I prefer to do that.
Knowledge of the assembler helps when you want to use specific assembler directives. The
gas manual is available online.
GDB
While programming assembly, you often need something to show you what the state of the machine is. You need to see every register and every memory location. This is where GDB comes in. It is free, it is easy to use. Here is a gentle introduction to gdb to cover most assembly needs.
$ gdb hello
Start gdb with the executable called hello. It prints a helpful message, and drops you to the (gdb) prompt. This prompt is where you type all commands.
(gdb) disassemble main
Dump of assembler code for function main:
0x000083d0 <+0>: push {r11, lr}
0x000083d4 <+4>: add r11, sp, #4
0x000083d8 <+8>: sub sp, sp, #8
0x000083dc <+12>: str r0, [r11, #-8]
0x000083e0 <+16>: str r1, [r11, #-12]
0x000083e4 <+20>: ldr r0, [pc, #20] ; 0x8400
0x000083e8 <+24>: bl 0x82e8
0x000083ec <+28>: mov r3, #0
0x000083f0 <+32>: mov r0, r3
0x000083f4 <+36>: sub sp, r11, #4
0x000083f8 <+40>: pop {r11, lr}
0x000083fc <+44>: bx lr
0x00008400 <+48>: andeq r8, r0, r8, lsl #9
End of assembler dump.
Look at the assembly source of any function. In this case, we looked through the assembly output of main, the entry point to our hello word function. There are some familiar instructions here already. Disassembly can be done for any executable. You don't need the source code for the program.
(gdb) break *0x000083e4
Breakpoint 1 at 0x83e4
This sets a breakpoint at the specified memory address. When you run the program, the execution will break at that location, and you will be dropped back on the gdb shell to inspect the state.
(gdb) run
Starting program: /home/user/ARM/hello
Breakpoint 1, 0x000083e4 in main ()
Alright, we started the program and it broke exactly where we asked it to. This is a great time to examine the registers and the memory.
(gdb) info registers
r0 0x1 1
r1 0xbed9a924 3201935652
r2 0xbed9a92c 3201935660
r3 0x83d0 33744
r4 0x0 0
r5 0x0 0
r6 0x0 0
r7 0x0 0
r8 0x0 0
r9 0x0 0
r10 0x40025000 1073893376
r11 0xbed9a7d4 0xbed9a7d4
r12 0xbed9a840 3201935424
sp 0xbed9a7c8 0xbed9a7c8
lr 0x4003b508 1073984776
pc 0x83e4 0x83e4
cpsr 0x60000010 1610612752
This command shows you the register state. As you can see, there are the standard registers r0-r12, and SP, LR, and PC. You can also see the status register CPSR printed in full. The function calling convention on ARM is that the first four arguments to a function are stored in r0-r3. Let's verify that this is the case.
The function we are looking at is main(int argc, char* argv[]). It has two arguments argc and argv, which should be in r0 and r1 respectively. r0 should contain argc, or the number of commandline arguments given. We invoked the program with no arguments, so the commandline arguments consist of only the program name. argc should be 1, which is what r0 contains
argv is trickier. It is a pointer to pointers containing strings. This is partly confirmed by r2, which is a large hex number: 0xbed9a924. It could be a memory location. Let's find out.
(gdb) x/w 0xbed9a924
0xbed9a924: 0xbed9aa0d
The "x/w" stands for eXamine memory/ parse as Word. Memory locations could contain anything, so we want to parse it as a 32 bit word to start out. The contents look a lot like the address itself. Let's see what the next few contents hold.
(gdb) x/12w 0xbed9a924
0xbed9a924: 0xbed9aa0d 0x00000000 0xbed9aa22 0xbed9aa32
0xbed9a934: 0xbed9aa3d 0xbed9aa47 0xbed9af37 0xbed9af43
0xbed9a944: 0xbed9af80 0xbed9af8f 0xbed9afa2 0xbed9afab
x/12w stand for eXamine memory/show me 12 Words. As you can see, all the contents of memory look like they are addresses. Let's see what is at the first address: at 0xbed9aa0d
(gdb) x/w 0xbed9aa0d
0xbed9aa0d: 0x6d6f682f
Hmm, that doesn't look like an address. This should be a string, and rather than converting the 0x6d 0x6f 0x68 ... to ascii myself, I'll let gdb help me out.
(gdb) x/s 0xbed9aa0d
0xbed9aa0d: "/home/user/ARM/hello"
We are asking gdb to "eXamine memory / as String". gdb knows that C strings are null terminated, so it helpfully walks over the successive memory locations, interpreting each byte as ASCII, till it comes to a null terminator. So we have verified that argv[1] is a pointer to a string, containing the program name. Let's see what the next few memory addresses hold.
(gdb) x/10s 0xbed9aa0d
0xbed9aa0d: "/home/user/ARM/hello"
0xbed9aa22: "SHELL=/bin/bash"
0xbed9aa32: "TERM=xterm"
0xbed9aa3d: "USER=user"
0xbed9aa47: "LS_COLORS=rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:...
0xbed9ab0f: ":*.taz=01;31:*.lzh=01;31:*.lzma=01;31:*.tl...
0xbed9abd7: "eb=01;31:*.rpm=01;31:*.jar=01;31:*.rar=0"...
0xbed9ac9f: ":*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.ti...
0xbed9ad67: "v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=0...
0xbed9ae2f: "yuv=01;35:*.cgm=01;35:*.emf=01;35:*.axv...
We "eXamine 10 memory locations as String", and we find that we have run past the end of argv. We are seeing the environment variables that are specified by the Bash shell, including the name of the shell, the username, and the colors for the different file types. I forgot, where were we?
(gdb) where
#0 0x000083e4 in main ()
We can find out our state in the execution by asking 'where'. Though we could just as easily have looked up the Program Counter register for this simple program.
(gdb) disassemble main
Dump of assembler code for function main:
0x000083d0 <+0>: push {r11, lr}
0x000083d4 <+4>: add r11, sp, #4
0x000083d8 <+8>: sub sp, sp, #8
0x000083dc <+12>: str r0, [r11, #-8]
0x000083e0 <+16>: str r1, [r11, #-12]
=> 0x000083e4 <+20>: ldr r0, [pc, #20] ; 0x8400
0x000083e8 <+24>: bl 0x82e8
0x000083ec <+28>: mov r3, #0
0x000083f0 <+32>: mov r0, r3
0x000083f4 <+36>: sub sp, r11, #4
0x000083f8 <+40>: pop {r11, lr}
0x000083fc <+44>: bx lr
0x00008400 <+48>: andeq r8, r0, r8, lsl #9
End of assembler dump.
gdb shows a helpful arrow showing where we are. We can set another breakpoint if we like. After a long debugging session, you might forget which breakpoints you have set.
(gdb) info breakpoints
Num Type Disp Enb Address What
1 breakpoint keep y 0x000083e4
breakpoint already hit 1 time
info registers
3 breakpoint keep y 0x000083f0
info registers
You can see all breakpoints with 'info breakpoints' and you can delete breakpoints with 'delete x', where x is the number of the breakpoint. When deleting breakpoints, gdb doesn't produce any output if it is successful.
(gdb) delete 3
A very helpful technique when debugging for loops is to run some commands automatically when a breakpoint is hit. This is done with the 'commands' directive as folows.
(gdb) break *0x000083ec
Breakpoint 4 at 0x83ec
(gdb) commands 4
Type commands for breakpoint(s) 4, one per line.
End with a line saying just "end".
>info registers
>end
(gdb)
Now, when the breakpoint is hit, gdb will automatically run the 'info registers' command. Let's continue running this program so it can hit the next breakpoint.
(gdb) continue
Continuing.
Hello World
Breakpoint 4, 0x000083ec in main ()
r0 0xc 12
r1 0x0 0
r2 0x40153228 1075130920
r3 0x83d0 33744
r4 0x0 0
r5 0x0 0
r6 0x0 0
r7 0x0 0
r8 0x0 0
r9 0x0 0
r10 0x40025000 1073893376
r11 0xbed9a7d4 0xbed9a7d4
r12 0x0 0
sp 0xbed9a7c8 0xbed9a7c8
lr 0x83ec 33772
pc 0x83ec 0x83ec
cpsr 0x60000010 1610612752
gdb ran past the puts(), and printed "Hello World" on the screen. It hit the breakpoint, and automatically showed us the registers. Great. Let's finish up by continuing.
(gdb) continue
Continuing.
[Inferior 1 (process 17307) exited normally]
(gdb) info registers
The program has no registers now.
The program is done. We can't examine registers or memory because it isn't running anymore.
The full
GDB documentation is available online.
Links
Now that you know how to examine registers and memory, you can write ARM programs and verify that they do the right thing. You can break at various locations and verify that your load store and move instructions are working as expected.
The Gnu tools are ubiquitous and mature. Once you learn how to use gdb and gcc on ARM, you can easily use the same tricks on another platform like Intel. Here are all the manual links again:
- GCC manual
- Gas (Gnu Assembler) manual
- GDB manual