recoilCTF -- Embedded

Posted at — Jun 6, 2019

embedded-arm-muscles

_{^{*With strong ARMs!*}}

This summer, one of the project leads at my internship decided to host a CTF (Capture The Flag) competition local to the lab that we’re working in. At the time of writing this post, the competition has 35+ challenges, in quite a handful of categories (crypto, reverse engineering, programming challenges, etc.). I’ve sorrily attempted to do a few CTFs in the past, but this is the first time where I’ve really been pretty interested (and also feel I know just enough) in solving some things.

This post will focus on the Embedded category of recoilCTF. There are three challenges in this category (so far), and thanks to my Computer Architecture class last semester, I was able to solve them. In particular, these challenges focus on ARM assembly, a very common instruction set architecture for RISC systems.

Assembly

Before getting started, I’d like to provide a very brief primer/reference for the relevant material in this (and following) challenges. Skip down to the first challenge, Welcome to ARM if you don’t care about/need this.

Abstraction

If I were to take one word away from my computer architecture class, it would be abstraction. Assembly syntax varies much from high-level languages, in that each line is representative of one instruction/action. This is why assembly is “low-level”; it provides much less abstraction than your conventional high-level language (i.e. Python, Java) where one line can be representative of multiple instructions/actions (high abstraction). Assembly allows one to have fine control over what is allocated, when and where. Here is an example of what I mean. Below is equivalent code (C++/ARM) for a function which takes a number as a parameter, and then returns its square:

int square(int num) {
  return num * num;
}

@ fp = frame pointer // vars
@ sp = stack pointer // temp data
@ within brackets [] // contents

square(int):
  str   fp, [sp, #-4]!    @ fp = sp - 4
  add   fp, sp, #0        @ fp = sp + 0
  sub   sp, sp, #12       @ sp = sp - 12
  str   r0, [fp, #-8]     @ addr[fp-8]=r0
  ldr   r3, [fp, #-8]     @ r3 = [fp - 8]
  ldr   r2, [fp, #-8]     @ r2 = [fp - 8]
  mul   r1, r2, r3        @ r1 = r2 * r3
  mov   r3, r1            @ r3 = r1
  mov   r0, r3            @ r0 = r3
  add   sp, fp, #0        @ sp = fp + 0
  ldr   fp, [sp], #4      @ fp = [sp] + 4
  bx    lr                @ return

_{^{Look at all that abstraction!}}

As we can see, the ARM assembly code is much lengthier. Some would call it tedious and confusing. However, within this tediousness, there is much control. We can see all the assignment to particular locations, and exact function. If you’re still confused about some of the above, keep reading.

Syntax

Registers

In ARM, there are 16 registers, quickly accessible locations available to the CPU, which can be referenced by r{number}. Different registers serve different purposes, as seen in the table below:

register	purpose
r0-r3	temporary registers (used for args)
r4-r10 (r12)	scratch registers (global scope)
r11	frame pointer (fp)
r12	specical register, won’t be saved by called routines
r13	stack pointer (sp)
r14	link register (lr)
r15	program counter (pc)

Immediate Values

Numbers prepended with a # symbol are called immediate values. These can serve various purposes, such as adding an offset to a memory address, or an integer constant value.

Instructions

Additionally, there are many instructions - too many to list here. A basic list can be found on this site.

Welcome to ARM

Whew! With all that out of the way, let’s get started. The first challenge in the category deals with basic arithmetic.

`mystery1.S`

.global mystery1
mystery1:
  add r0,r0,r1        @ r0 = r0 + r1
  mul r0,r0,r2        @ r0 = r0 * r2
  sub r0,r3,r0        @ r0 = r3 - r0
  bx lr               @ return

Pseudocode for what is going on is written on the right side of the block for clarity. When put like this, the challenge becomes very simple. I’ll briefly go over each instruction, and what it does:

add: takes the two latter arguments and stores the sum in the first argument
mul: takes the two latter arguments and stores the product in the first argument
sub: takes the two latter arugments and stores the difference in the first argument
bx: branch and exchange instruction set. A “branch” allows you to visit and execute code in a different location in the file. In this case, lr is an alias for r14 or the link register, which usually holds the return address. bx lr together will basically return

We look back to the original challenge prompt and see that 4 numbers are given as parameters (1, 2, 3, 4). If we reference our register table from above, we see that there are conveniently 4 registers to store arguments in, and thus the parameter will correspondingly be stored in r0, r1, r2, and r3. Below, I’ve rewritten the pseudocode with the replaced values for the registers for even more simplification:

  r0 = 1 + 2          @ r0 = 3
  r0 = 3 * 3          @ r0 = 9
  r0 = 4 - 9          @ r0 = -5
  return

And it looks like we have an answer! After plugging everything in and following through with the math, we get -5. Sure enough, this was accepted as correct.

Barrel Shifter

The second challenge in Embedded is a little bit more involved, but not too much so. It requires knowledge of bitwise operations, and possibly conversion between number systems.

`mystery2.S`

.global mystery2
mystery2:
  mov r3,#0xFE000000     @ r3 = 0xFE000000
  mov r1,#0xBE           @ r1 = 0xBE
  mov r2,#0xEF           @ r2 = 0xEF
  add r1,r2,r1,LSL #8    @ r1 = r2 + (r1 * 2^8)
  add r3,r1,r3,ASR #8    @ r3 = r1 + (r3 / 2^8)
  str r3, [r0]
  bx lr

Once again, I’ve written the psuedocode for each line on the right. There are also two new instructions, which I will again briefly describe:

mov: takes the argument in the latter position, and stores it in the register found at the first position
str: takes the argument in the first position (note that this is different from other instructions) and stores it at the second position

Basically we can see that all we have to do to find the flag is perform some more arithmetic, then some bitwise operations, and store the calculated value at r0. This matches up with what the challenge is asking us; to find the address of the argument i.

The first three instructions on lines 3 - 5 simply place the specified value at those registers.

On line 6, we have an add instruction with an additional LSL #8 at the end. LSL stands for “Logical Shift Left”, and #8 tells us the offset. In a logical shift, every bit in the operand is simply moved the given number of bit positions, and the vacant bit-positions are filled (usually with 0s unless it is a circular shift).

On line 7, we have another add instruction, but this time there is an additional ASR #8 at the end. ASR stands for “Arithmetic Shift Right”, and #8 once again tells us to shift by 8 positions. Arithmetic shifts vary from logical shifts in that instead of filling with all 0s when shifting to the right, the MSB (most significant bit) is replicated to fill in all the vacant positions. Because our offset is 8, we are appending 8 1s to the binary value, or 2 additional Fs.

For both lines 6 and 7, we can convert the base-10 offset of 8 into hexadecimal format for an easier calculation. 8 –> 0x100.

0xBE << 0x100 = 0xBE00
0xFE000000 >> 0x100 = 0xFFFE0000

The work for these two lines is below when plugging into the pseudocode:

r1 = 0xEF + 0xBE00 = 0xBEEF
r3 = 0xBEEF + 0xFFFE0000 = 0xFFFEBEEF

The resulting hex is 0xFFFEBEEF, our flag.

b!

This challenge is great because it begins to incorporate some more specific variants of your basic ARM instructions, as well as the general overall structure and flow of assembly. Let’s get into it.

`mystery3.S`

mystery5:
    mov r3,r0            @ r3 = "are_we_having_fun_yet"
    ldrb r2, [r0]        @ r2 = "a"
    mov r0, #0           @ r0 = 0
    cmp r2, #0           @ compare r2 == 0?
    bxeq lr              @ if EQUAL, return
.L4:
    add r0,r0,#1         @ r0 += 1
    ldrb r2, [r3, #1]!   @ r3 = r3 + 1 | r2 = mem[r3]
    cmp r2, #0           @ compare r2 == 0?
    bne .L4              @ if NOT EQUAL branch to .L4
    bx lr                @ return

Going through the psuedocode step by step, r0 contains the first and only argument passed, the string “are_we_having_fun_yet”. Using our knowledge from pevious challenges, we can easily see that from the mov instruction on line 2, this is now stored in r3. From here on out, there are a couple of instructions that are new again. Here’s another quick description:

ldr: loads the second argument into the first (ldrb is explained further down)
cmp: compares two arguments to see if the first is equal/not equal, less/greater than, or less/greater than or equal to the second
bxeq: another branch instruction like we saw in the first two challenges. bxeq however will branch to the second argument only if the previous compare evaluated that the first argument was EQUAL to the second
bne: branch to the second argument only if the first argument is NOT EQUAL to the second

Moving onto line 3, you should now know what ldr does, but what about ldrb? What does the b mean? It turns out that it stands for byte, specifically that the first byte of the contents of the second argument is loaded into the first. In most standard text encodings (ASCII, ISO-8895-1, UTF-8), one character is equal to one byte, so we can conclude that r2 now holds “a”, the first character of the argument in r0.

On line 4 we do another mov before arriving at our cmp instruction on line 5. From above, we can determine what is going on here. We look to see if r2 contains 0; if it does, we’ll return and exit. However we know from just earlier that r2 contains the character ‘a’, so we move on.

.L4: is not an instruction, but rather an important component of assembly structure. Line 7 contains what is known as a label, which basically does what it sounds like - creates a label (which can be named anything) in our code. Labels are important because they not only provide clarity to what the assembly code following the label might do, but also create placeholders for their location in the file. In this way, we can combine the use of labels, cmp, and branch instructions, to construct the equivalent of a conditional loop. This will become important in a moment.

Line 8 adds 1 to what’s in r0 (at this point 0 as it was set in line 4).

The instruction at 9 looks like it’s another variation of the ldr instruction we discussed earlier. We know what the b appended to the end does, and the #1 is an immediate value, but what about the added !? Think about it for a bit, or do a bit of research on your own before clicking on the spoiler below.

The `!` means "Register write-back": the base register is used to calculate the address of the transfer, and is updated.

While sounding a bit complicated, it’s really not too bad. If it helps, you can think about its functionality like a before operand increment (++x). In this type of increment, the value is updated to +1 more before it is retrieved, i.e. if you have:

1
2


int x = 1;
printf("%d", ++x);

the print will display 2.

Converting this logic to the assembly, the index within the brackets is incremented to the next index, and then stored/deferenced into r2. (In this case the pseudocode may be clearer, so take a look at that).

On the next line (10) is another compare, again looking at what is stored in r2, and 0. This will evaluate to false - BUT! Remember how I said earlier that labels and loops will be important? Now’s the time to talk about that.

Because cmp evaluates to false, the bne on line 11 will go back up to .L4, and continue running from there. We can see through the fall through (continuing) behavior that this will happen quite often; the ldrb on line 9 will get the next character, store in r2, and as a letter is not equal to 0 (or NULL in ASCII), continue going back up to .L4 (essentially looping).

We also have to remember the add instruction on line 8. This should remind you of something in a higher-level programming language that is commonly used for various applications. As we continue looping, the value in r0 will be incremented by 1 on each fall through, becoming a counter.

Following all this, will there ever be a point in which the contents of r2 will be 0 or NULL such that the loop stops? The answer is yes. As r3 is a finite string, we’re going to eventually reach the end, and the next loaded character after the “t” in “yet” will be 0 (NULL). At this point only, will the bne instruction fall through, and we will instead bx lr to return.

If we follow how many times the loop executes to get through every letter, we get 21, which is what will be returned. Thus, this is our flag.

Andrew Chu

Discussion on CTFs, general CS, and misc.