A version of this lab notebook entry was published in the International Journal of Proof-of-Concept or Get The Fuck Out (PoC||GTFO 0x16, page 39).
Many thanks to @lynn for proof-reading a draft of this document.
endrift has recently written an article on a new method she discovered for dumping the GBA's BIOS, different from the MidiKey2Freq method currently used. This article is about a third method I've discovered that is different from those two.
I've been having a very recent fascination with the Game Boy Advance. The hardware is simple relative to more complex modern handhelds and the CPU is of an architecture I'm already familiar with (ARM7TDMI), making it a rather fun toy to play with. The GBA is a console where cycle counting is important. In order to learn more about the hardware, I have been reading documentation that others have produced (like Martin Korth's GBATEK) and writing small programs to test edge-cases of the hardware that I didn't quite understand. One example of this was the BIOS ROM.
The BIOS ROM is a piece of read-only memory that sits at the beginning of the GBA's address space. In addition to being used for initialization, it also provides a handful of routines accessable by software interrupts. It is rather small (16 KiB). Games running on the GBA are prevented from reading the BIOS. Only code running from the BIOS itself can read the BIOS. Attempts to read the BIOS from elsewhere results in only the last successfully fetched BIOS opcode, so the BIOS from the game's point of view is just a repeating stream of garbage.
This naturally leads to the question: How does the BIOS ROM actually protect itself from improper access? Most emulators look at the CPU's program counter and allow or disallow accesss based on whether the current instruction is within or outside of the BIOS memory region, but this can't possibly be how the real BIOS ROM actually determines a valid access: wiring up the PC to the BIOS ROM chip would've been prohibitively complex. Thus a simpler technique must have been used.
A normal ARM7TDMI chip exposes a number of signals to the memory system in order to access memory. A full list of them are available in the ARM7TDMI reference manual (page 3-3), but the ones that interest us at the moment are nOPC and A[31:0]. A[31:0] is a 32-bit value representing the address that the CPU wants to read. nOPC is a signal that is 0 if the CPU is reading an instruction, and is 1 if the CPU is reading data. From this, a very simple scheme for protecting the BIOS ROM could be devised: if nOPC is 0 and A[31:0] is within the BIOS memory region, unlock the BIOS. otherwise, if nOPC is 0 and A[31:0] is outside of the BIOS memory region, lock the BIOS. nOPC of 1 has no effect on the current lock state. This serves to protect the BIOS because the CPU only emits a nOPC = 0 signal with A[31:0] being an address within the BIOS only it is intending to execute instructions within the BIOS. Thus only BIOS instructions have access to the BIOS.
While the above is a guess of how the GBA actually does BIOS locking, it matches the observed behaviour.
This answers our question on how the BIOS protects itself. But it leads to another: Are there any edge-cases due to this behaviour that allow us to easily dump the BIOS? It turns out the answer to this question is yes.
A[31:0] falls within the BIOS when the CPU intends to execute code within the BIOS. This does not necessarily mean the code is actually has to be executed, but there only has to be an intent by the CPU to execute. The ARM7TDMI CPU is a pipelined processor. In order to keep the pipeline filled, the CPU accesses memory by prefetching two instructions ahead of the instruction it is currently executing. This results in an off-by-two error: While BIOS sits at 0x00000000 to 0x00003FFF, instructions from 0xFFFFFFF8 (i.e.: -8) to 0x00003FF8 have access to the BIOS!
This means that if you could place instructions at memory locations 0xFFFFFFF8 to 0xFFFFFFFF, you would have access to the BIOS with protection disabled. Unfortunately, there is no RAM backing these memory locations. This complicates this attack somewhat, and we need to now talk about what happens with the CPU reads unmapped memory.
When the CPU reads unmapped memory, the value it actually reads is the residual data remaining on the bus left after the previous read (also known as "open bus"). Since the instruction prefetcher is often the last thing to read from the bus, what's on the bus is often the last prefetched instruction. This makes it simple to make it look like instructions exist at our unmapped memory location: put it after the instruction in memory that jumps to the unmapped memory location so that the prefetcher reads it. Some people call this "prefetch stuffing".
One thing to note is that the bus is 32 bits wide, which means that you can either stuff one ARM instruction or two Thumb instructions (Thumb instructions are 16 bit wide). Since we need to do a memory read followed by a return, we have to use Thumb instructions.
Where we jump from is also important. Different memory areas put slightly different things on the bus. For example, if the prefetcher reads OAM memory at a 4-byte aligned location, the bus would contain [PC+4]; [PC+6]. This is perfect and exactly what we need. Unfortunately I had trouble executing code from OAM (and was too lazy to investigate why), so I had to use a different memory region. We can't use main RAM, palette memory, VRAM or cartidge ROM because the bus would contain [PC+4]; [PC+4]; this is just the same instruction twice and not what we want. The last remaining memory region is internal work RAM.
Internal work RAM (or IWRAM) has a slightly different behaviour on the bus. When prefetching thumb instructions, only half of the bus is changed. This means that half of the penultimate memory access is still visible. This just means we have to execute a memory read or write before our jump.
And that's it!
ldr r0, [r0];
bx lr(0x47706800). As we are starting from IWRAM, we do this by a combination of prefetch stuffing and a
ldr r0, [r0]instruction at 0xFFFFFFFC executes, reading the unlocked memory.
bx lrinstruction at 0xFFFFFFFE executes, returning to our code.
.thumb .section .iwram .func read_bios, read_bios .global read_bios .type read_bios, %function .balign 4 // u32 read_bios(u32 bios_address): // bios_address would be in register r0 read_bios: ldr r1, =0xFFFFFFFD ldr r2, =0x68006800 str r2, [r1] //<-- Puts "ldr r0, [r0]" (0x6800) onto open bus (penultimate memory access). bx r1 //<-- Jump to 0xFFFFFFFC in thumb mode ldr r0, [r0] bx lr //<-- The prefetcher puts this return instruction onto open bus. .balign 4 .endfunc .ltorg
Full working program available here.
|Instruction||Cycle||PC||What's happening||A[31:0]||nOPC||Bus contents|
|str r2, [r1]||1||$||Prefetch of $+4||$+4||0||[$+4]|
|2||$||Data store of 0x68006800||0xFFFFFFFD||1||0x68006800|
|bx r1||1||$+2||Prefetch of $+6||$+6||0||0x47706800|
|2||$+2||Pipeline reload (0x6800 is read into pipeline)||0xFFFFFFFC||0||0x47706800|
|3||$+2||Pipeline reload (0x4770 is read into pipeline)||0xFFFFFFFE||0||0x47706800|
|ldr r0, [r0]||1||0xFFFFFFFC||Prefetch of 0x00000000||0x00000000||0||[0x00000000]|
|2||0xFFFFFFFC||Data read of [r0]||r0||1||[r0]|
|bx lr||1||0xFFFFFFFE||Prefetch of 0x00000002||0x00000002||0||[0x00000002]|
Some comments were made on the original gist.