Showing posts with label reverse engineering. Show all posts
Showing posts with label reverse engineering. Show all posts

2009-08-28

More 8032 / 8051 trickery

Don't worry, the (serial) printf handler is coming. In the meantime, some more hard to spot Mediatek 8032/8051 (Geez, couldn't at least agree on the base architecture name, and use declinations that make sense?) compiler trickery with lcall:

ROM:C32C                 lcall   ROM_B613
ROM:C32F lcall copy_4_bytes_to_xRAM ; input: DPTR = dest xRAM address
ROM:C32F ; PC = src bytes address
ROM:C32F ; ---------------------------------------------------------------------------
ROM:C332 .byte 0x00,0xC,0x00,0x00 ; 0
ROM:C336 ; ---------------------------------------------------------------------------
ROM:C336 mov DPTR, #0xFFFC
ROM:C339 lcall ROM_B613
ROM:C33C lcall copy_4_bytes_to_xRAM ; input: DPTR = dest xRAM address
ROM:C33C ; PC = src bytes address
ROM:C33C ; ---------------------------------------------------------------------------
ROM:C33F .byte 0,0xC,0,0 ; 0
ROM:C343 ; ---------------------------------------------------------------------------
ROM:C343 mov R6, #0x12
ROM:C345 mov R7, #0


When people extensively start/have to use instructions in other ways than they were designed for (shouldn't all lcalls' return flow of execution pcik up right after the instruction itself?), you know that you have a poorly designed CPU, no matter what people manage to achieve with it.

Yeah, it's another lcall'ed function that doesn't return where it's supposed to and breaks our nice disassembly flow. Thanks for making us having to manually edit our subroutine ends in IDA Pro, morons!

So, how does it work this time? Similarly to the previous trick actually:

ROM:B498 ; =============== S U B R O U T I N E =======================================
ROM:B498
ROM:B498 ; input: DPTR = dest xRAM address
ROM:B498 ; PC = src bytes address
ROM:B498
ROM:B498 copy_4_bytes_to_xRAM: ; CODE XREF: ROM:2A49p
ROM:B498 ; ROM:2A77p ...
ROM:B498 mov R0, DPL
ROM:B49A mov B, DPH ; DPTRd [DPHd DPLd] -> B R0 = Dest Pointer
ROM:B49D pop DPH
ROM:B49F pop DPL ; DPTRs [DPHs DPLs] = Src Pointer
ROM:B4A1 lcall copy_byte
ROM:B4A4 lcall copy_byte
ROM:B4A7 lcall copy_byte
ROM:B4AA lcall copy_byte
ROM:B4AD clr A
ROM:B4AE jmp @A+DPTR ; DPTRs (= original return PC) has
ROM:B4AE ; End of function copy_4_bytes_to_xRAM ; been incremented 4 times at this stage
ROM:B4AE
ROM:B4AF
ROM:B4AF ; =============== S U B R O U T I N E =======================================
ROM:B4AF
ROM:B4AF
ROM:B4AF copy_byte: ; CODE XREF: copy_4_bytes_to_xRAM+9p
ROM:B4AF ; copy_4_bytes_to_xRAM+Cp ...
ROM:B4AF clr A
ROM:B4B0 movc A, @A+DPTR ; read [DPTRs]
ROM:B4B1 inc DPTR ; DPTRs++ (as well as [DPTRs] -> A)
ROM:B4B2 xch A, DPH
ROM:B4B4 xch A, B ; DPTRs <-> DPTRd (B R0)
ROM:B4B6 xch A, DPH
ROM:B4B8 xch A, R0
ROM:B4B9 xch A, DPL
ROM:B4BB xch A, R0 ; A is still [DPTRs] at this stage
ROM:B4BC movx @DPTR, A ; [DPTRs] -> [DPTRd]
ROM:B4BD inc DPTR ; DPTRd++
ROM:B4BE xch A, DPH ; DPTRs <-> DPTRd
ROM:B4C0 xch A, B
ROM:B4C2 xch A, DPH
ROM:B4C4 xch A, R0
ROM:B4C5 xch A, DPL
ROM:B4C7 xch A, R0
ROM:B4C8 ret
ROM:B4C8 ; End of function copy_byte


Gotta love these scores of xch instructions. Kind of the 3 cups & one red ball street magician classic. The only thing you need to know though, is that all a sequence like the following 3 lines does:
xch A, register1
xch A, register2
xch A, register1

is simply exchanging the content of register 1 & register 2, and keeping A unchanged.

I won't say this isn't a nice trick to access a small amount of data right within your program, but I'll take my general purpose register generous 68000 any time over your crappy 8051. It's a wonder really: How comes there are so few consumer devices using embedded 68000, when it would be so much more efficient for embedded applications? If you're gonna play with these kind of intel x32/x51 hurdles, it won't be that much different to go RISC all the way!

2009-08-26

Switch statements, 8032/8051 style

Now this next trick is a little bit more clever.

If, when disassembling 8032 / 8051 code, you ever see the start of an lcall'ed routine that looks like this
ROM:0000B5BA                 pop     DPH             ; Data Pointer, High Byte
ROM:0000B5BC pop DPL ; Data Pointer, Low Byte
and then proceeds to use DPTR for movc instructions, the effective DPTR address used in that routine will be the next address after the LCALL (because lcall did push the current PC on the stack)

This is effectively used by, hum, some code, to implement switch statements that might be a bit tricky to detect during reverse engineering, as the disassembler obviously expect lcall to return, and will start disassembly the switch table.

Example code:
ROM:00006CA7 ROM_6CA7:                               ; CODE XREF: ROM_6C60+15j
ROM:00006CA7 ; ROM_6C60+1Bj ...
ROM:00006CA7 mov DPTR, #0xC46A
ROM:00006CAA movx A, @DPTR
ROM:00006CAB xrl A, #0x60
ROM:00006CAD jnz ROM_6CF4
ROM:00006CAF mov DPTR, #0xB03E
ROM:00006CB2 movx A, @DPTR
ROM:00006CB3 lcall case_switch_byte
ROM:00006CB3 ; ---------------------------------------------------------------------------
ROM:00006CB6 .word ROM_6CE4
ROM:00006CB8 .byte 0
ROM:00006CB9 .word ROM_6CE4
ROM:00006CBB .byte 0x20
ROM:00006CBC .word ROM_6CE0
ROM:00006CBE .byte 0x2A
ROM:00006CBF .word ROM_6CDC
ROM:00006CC1 .byte 0x2B
ROM:00006CC2 .word ROM_6CDE
ROM:00006CC4 .byte 0x2D
ROM:00006CC5 .word ROM_6CE2
ROM:00006CC7 .byte 0x2F
ROM:00006CC8 .word ROM_6CDA
ROM:00006CCA .byte 0x39
ROM:00006CCB .word ROM_6CD2
ROM:00006CCD .byte 0x5A
ROM:00006CCE .word 0
ROM:00006CD0 .word 0x6CEC
ROM:00006CD2 ; ---------------------------------------------------------------------------
ROM:00006CD2
ROM:00006CD2 ROM_6CD2: ; DATA XREF: ROM_6C60+6Bo
ROM:00006CD2 mov DPTR, #0xB03E
ROM:00006CD5 mov A, #0x30 ; '0'
ROM:00006CD7 movx @DPTR, A
ROM:00006CD8 sjmp ROM_6D54
ROM:00006CDA ; ---------------------------------------------------------------------------


with case_switch_byte being:

ROM:B5BA
ROM:B5BA ; =============== S U B R O U T I N E =======================================
ROM:B5BA
ROM:B5BA ; Iput: A = matching case value (byte)
ROM:B5BA
ROM:B5BA case_switch_byte: ; CODE XREF: ROM:4046p
ROM:B5BA ; ROM_4552+178p ...
ROM:B5BA pop DPH ; Data Pointer, High Byte
ROM:B5BC pop DPL ; DPTR = lcall return address
ROM:B5BE mov R0, A
ROM:B5BF
ROM:B5BF loop_through_cases: ; CODE XREF: case_switch_byte+24j
ROM:B5BF clr A
ROM:B5C0 movc A, @A+DPTR ; NB: @ is confusing. It's just A+DPTR
ROM:B5C1 jnz valid_dest ; make sure dest @ != 0
ROM:B5C3 mov A, #1
ROM:B5C5 movc A, @A+DPTR
ROM:B5C6 jnz valid_dest
ROM:B5C8 inc DPTR
ROM:B5C9 inc DPTR ; if null dest, just use the next
ROM:B5C9 ; word as dest address
ROM:B5CA
ROM:B5CA dest_match: ; CODE XREF: case_switch_byte+1Fj
ROM:B5CA movc A, @A+DPTR
ROM:B5CB mov R0, A
ROM:B5CC mov A, #1
ROM:B5CE movc A, @A+DPTR
ROM:B5CF mov DPL, A ; Data Pointer, Low Byte
ROM:B5D1 mov DPH, R0 ; dest into DPTR
ROM:B5D3 clr A
ROM:B5D4 jmp @A+DPTR
ROM:B5D5 ; ---------------------------------------------------------------------------
ROM:B5D5
ROM:B5D5 valid_dest: ; CODE XREF: case_switch_byte+7j
ROM:B5D5 ; case_switch_byte+Cj
ROM:B5D5 mov A, #2
ROM:B5D7 movc A, @A+DPTR
ROM:B5D8 xrl A, R0 ; cmp val with parameter
ROM:B5D9 jz dest_match
ROM:B5DB inc DPTR
ROM:B5DC inc DPTR
ROM:B5DD inc DPTR ; skip 3 bytes to next switch table entry
ROM:B5DE sjmp loop_through_cases
ROM:B5DE ; End of function case_switch_byte
ROM:B5DE
ROM:B5E0


The same kind of routine also exists for a word parameter instead of a byte.
Most of the time, the case values will follow some kind of logical order, so if you see a bunch of sequencing bytes of word, interlaced with what look like offsets, and preceded by an lcall, you might want to chack what's on the other end of that lcall.

OR, the preferred way once you have made your initial pass at identifying code, look for a function that starts by popping DPH and DPL, and seek all the lcalls that cross reference to it to identify the switch tables.

Oh, and for those who might wonder, of course, as soon as you pop the PC address that's been enqueued on the stack, the lcall never returns, and becomes exactly like a jump.

Coming next: How the hell are these bloody strings and other data sitting in standalone data sections referenced, where there does not appear to be any obvious address referencing to them anywhere in the disassembly...