Reverse-Engineering in OS X on x86
The other day at work I had a task: to figure out how to change the displayed title of a minimized window in the Dock without actually changing the window's title. (Please trust that I had a very good reason for wanting to do this.)
While there are some excellent articles about how to reverse-engineer under OS X, they're all PowerPC-based. And even though the future of the Mac is x86, it seems like people have lots of anxiety about having to work with it.
I think the problem is not a lack of documentation on x86 assembly, but a surfeit of it. Most of it is Windows- or DOS-centric, usually with the wrong syntax (Intel syntax vs the AT&T syntax that GCC uses), and with the aim of teaching how to write it. But reading x86 assembly really isn't that hard. If all you want to do is learn how to read the code generated by GCC, it's probably just as easy as PowerPC.
The other day I took notes of my discoveries. Let's touch on two functions, both in PowerPC and x86 flavors. For those of you who only know PowerPC assembly, I hope you'll be pleasantly surprised.
Before we begin, I'm going to assume that you're comfortable with assembly in general (though not necessarily with any particular one). If you have the latest developer tools, launch Shark (in /Developer/Applications/Performance Tools
) and in the Help menu you can access various ISA references. In addition, Apple has ABI documentation for both the PowerPC and x86. I'm going to go over each function twice (once for PowerPC and once for x86); feel free to skim the PowerPC version if you're accustomed to it. And finally, this is only for the 32-bit version of each platform; things change even more with 64 bits.
SetWindowTitleWithCFString
The trail always begins with a public call that uses the SPI that you want to figure out. In this case, I chose SetWindowTitleWithCFString
because it has to somehow set the title of a window even if it's minimized. I went with Carbon because sometimes the dynamic nature of Objective C with Cocoa makes tracing code harder.
PowerPC
<+0>: mflr r0 // save linkage <+4>: stmw r30,-8(r1) // stash r30, r31 <+8>: mr r30,r4 // save r4 (new title) <+12>: stw r0,8(r1) // make stack frame <+16>: stwu r1,-80(r1) // make stack frame
This is the prologue of the function. The PowerPC doesn't have a dedicated stack pointer (convention is to use r1
for that), so the common way of implementing branches by pushing the PC onto the stack doesn't work. Instead, the PowerPC has a link register and a command bl
to branch and put the old PC value into the link register. Thus, almost every function starts with mflr r0
, to pull the old PC into a usable register. Then in <+4> we save off some registers that we're going to smash. Every function needs scratch registers to hold local variables, and usually the high-numbered registers are used. The stmw
(store multiple words) instruction is useful for ditching many high registers on the stack. Then in <+12> we drop the old PC onto the stack and allocate 80 bytes on the stack.
A note on parameter passing. Integer-sized parameters (the only kind we'll be dealing with today) are passed into a function starting with r3
and going up through the registers. Return values are returned in r3
. So we see that in <+8> we stick away the pointer to the new name in r30
(whose previous value was stored on the stack earlier).
<+20>: bl 0x92881384 <_Z13GetWindowDataP15OpaqueWindowPtr> <+24>: li r0,-5600 // errInvalidWindowRef <+28>: cmpwi cr7,r3,0 // if no window data, bail <+32>: beq- cr7,0x928d2ae0 <+60> <+36>: cmpwi cr7,r30,0 // if no string to set, bail <+40>: li r0,-50 // paramErr <+44>: beq- cr7,0x928d2ae0 <+60> <+48>: mr r4,r30
This is where we must start making inferences as to what the code is doing. Fortunately, we have the symbols so it's not too hard. We see that we use the WindowRef as a parameter to a C++ function GetWindowData(OpaqueWindowPtr)
, as the WindowRef was passed in as r3
and r3
wasn't altered before the call. In addition, note that the function return value, being in r3
, will overwrite the WindowRef value which wasn't saved in a high register. That's fine, as the WindowRef was just an index into a table and won't be needed further.
At this point we run some checks. We compare both r3
and r30
to zero and if either are we jump to the end with r0
set to the appropriate error code. (The end of the function will move r0
into r3
for return.)
The PowerPC condition register has eight condition sets. Why are we using cr7
here? Probably because cr7
is volatile and we can get away with not saving/restoring it.
<+52>: bl 0x928d2af8 <_ZN10WindowData14SetTitleCommonEPK10__CFString> <+56>: li r0,0 // return noErr <+60>: addi r1,r1,80 // tear down stack frame and return <+64>: mr r3,r0 <+68>: lwz r0,8(r1) <+72>: lmw r30,-8(r1) <+76>: mtlr r0 <+80>: blr
The rest is pretty simple. We call a member function WindowData::SetTitleCommon(CFString*)
, and then do common tear down. We restore the stack pointer, put the return value into r3
, restore the registers, move the old PC back into the link register, and branch to the link register (blr
), returning us to our caller.
x86
The PowerPC register file is really easy: r0
, r1
, r2
... r31
. x86 has fewer registers and they've historically had different roles (accumulator, base, source index, destination index, and so on). Seriously, forget about that. There are eight registers you care about. eax
, ebx
, ecx
, edx
, esi
, and edi
are all general-purpose registers. esp
is the stack pointer. ebp
is the frame pointer. That's it.
PowerPC assembly reads right-to-left (except for stores). x86 AT&T syntax in general reads left to right.
<+0>: push %ebp // make stack frame <+1>: mov %esp,%ebp // make stack frame <+3>: push %esi // stash %esi <+4>: sub $0x14,%esp // make stack frame
x86 is stack-based. Parameters to a function are put at the top of the stack, with the rightmost parameters with the highest addresses. To execute the function, the call
instruction was used. It pushes the PC onto the stack, so even before we hit <+0> the parameters are four bytes above the stack pointer. In <+0> we save off the old stack frame value and in <+1> we establish our stack frame. At this point ebp
is fixed for the entire function. In <+3> we save the old values of registers we're going to use, and in <+4> we allocate space on the stack.
This is a perfect example of an ideal stack frame. ebp
is the frame pointer. It points (to the stack) at the old frame pointer. ebp
+4 is the IP of the function that called us. ebp
+8 is the first parameter passed in, ebp
+12 is the second, etc. Immediately below ebp
are the values saved from the registers, which will be restored before the return. And below that is a bunch of stack space used for either register spillage or calling subsequent functions. One interesting note is that rarely are parameters pushed onto the stack for a call. The stack pointer doesn't move once we make it past the prologue. We just set the memory right above esp
(the stack pointer) and make the call.
<+7>: mov 0x8(%ebp),%eax // get WindowRef in %eax <+10>: mov 0xc(%ebp),%esi // get new title in %esi
The parameters are passed on the stack. Since fiddling in memory is slow, we pull the values into registers. It's actually pretty analogous to how things go in PowerPC. There, lower registers like r3
are reused for parameter passing so important values are kept in the high registers. On x86 the parameters go on the stack and values are kept in registers (while they can). Why eax
and esi
? Why not?
<+13>: mov %eax,(%esp) // put WindowRef on the stack <+16>: call 0x92dfb8f6 <_Z13GetWindowDataP15OpaqueWindowPtr>
With the PowerPC, you can tell how many parameters a function has by how many registers starting with r3
are loaded. Here, just look at the register indirect addressing with esp
.
<+21>: mov %eax,%edx // stick WindowData into %edx <+23>: mov $0xffffea20,%eax // errInvalidWindowRef <+28>: test %edx,%edx // if no window data, bail <+30>: je 0x92e4bb04 <+54> <+32>: test %esi,%esi // if no string to set, bail <+34>: mov $0xffce,%ax // paramErr <+38>: je 0x92e4bb04 <+54>
Return values come back from functions in eax
, but otherwise this is pretty much the same. The only thing of interest to note is the clever use of the peculiar register structure. In <+23> the constant 0xffffea20
is loaded into eax
. But on <+34> the constant 0xffce
is loaded in ax
. But since ax
is just an alias for the lower 16 bits of eax
, the upper half of the word is left as 0xffff
and we get the full constant 0xffffffce
in eax
. Why do this? Because loading a 32 bit constant takes 5 bytes while loading a 16 bit constant only takes 4.
<+40>: mov %esi,0x4(%esp) // load new title as param 2 <+44>: mov %edx,(%esp) // load WindowData as param 1 <+47>: call 0x92e4bb0c <_ZN10WindowData14SetTitleCommonEPK10__CFString> <+52>: xor %eax,%eax // return noErr
Same stuff as before. The one note is the zeroing of eax
with an xor
. Just a fancy trick as the generated code is faster and smaller than the equivalent mov $0x0,%eax
.
<+54>: add $0x14,%esp // tear down stack frame and return <+57>: pop %esi <+58>: leave <+59>: ret <+60>: nop <+61>: nop
Mirror image of the stack frame creation.
UpdateDockTitle
That wasn't so hard, was it? Whether stack- or register-based, it's basically the same.
At this point I'd like to talk about UpdateDockTitle
. There are a few tricks that are in here, and so I'll focus the commentary on those more.
PowerPC
<+0>: mflr r0 // save linkage <+4>: stmw r28,-16(r1) // stash r28, r29, r30, r31 <+8>: mr r30,r3 // save r3 (WindowData) <+12>: bcl- 20,4*cr7+so,0x928d2bd4 <+16> <+16>: mflr r31 // get ip in r31
Whoa... what?
Short story: <+12> is an unconditional branch-and-link.
Long story: On the PowerPC, instructions like bge
, etc. are just aliases to a more primitive branch instruction, bc
(branch conditional). In this case, the first parameter is 20 (0b10100
), which indicates “branch always”. Since it's always going to branch, the second parameter doesn't matter, so it was set to all 1 bits (which translates to 4*cr7+so
).
Why do this? Because we're going to need to access some PC-relative data, and the PowerPC chip has no PC-relative addressing mode. And the register move instructions can't access the PC register. Therefore we cheat in a way by taking an unconditional jump to the next address. Since it's a branch and link, the link register is filled with the next address (which in this case equals the address just jumped to) which can be moved to a normal register.
Why branch-conditional with a condition “branch always”? The b
opcode only provides absolute addressing. Only bc
has relative addressing.
<+20>: stw r0,8(r1) <+24>: stwu r1,-80(r1) // make stack frame <+28>: addis r28,r31,3533 <+32>: bl 0x928d2c50 <_Z15GetTitleForDockP10WindowData> <+36>: lbz r0,-3364(r28) // haul initialization boolean into r0
This is where intuition comes in. We're hauling in some random byte from some PC-relative address. (lbz
is load byte and zero, which loads one byte from memory and clears the high bits.) What's byte sized? A bool
. Why a bool? Bools are flags. And with the value of the byte gating the call to RegisterAsDockClientPriv
, it's a safe bet that it's an initialization flag.
<+40>: mr r29,r3 // stash new title into r29 <+44>: cmpwi cr7,r0,0 // was initialized? <+48>: bne- cr7,0x928d2c04 <+64> // if so, skip <+52>: bl 0x9287f864 <_Z24RegisterAsDockClientPrivv> // else initialize <+56>: li r0,1 // and set flag <+60>: stb r0,-3364(r28) // as being intialized <+64>: mr r3,r30 <+68>: mr r4,r29 <+72>: bl 0x928d2c68 <SyncPlatformWindowTitle> // call with (WindowData, new title) <+76>: lwz r0,344(r30) // pull (WindowData + 344) <+80>: andis. r2,r0,64 // and pull a flag bit out of it (minimized?)
More intuition here. r30
contains a pointer to the WindowData
class instance, and we're accessing some word 344 bytes in. We don't care about the destination register (we don't touch r2
again this function) but don't miss the name of the opcode: “andis.
” Remember that the period means to update cr0
.
Once again, this is obviously a flag (bit-sized this time). But what does it mean? Context tells us that we only call CoreDockSetItemTitle
when it's set. Thus, guessing that it's the is-minimized flag is safe.
<+84>: beq- 0x928d2c38 <+116> // if not minimized, skip this step <+88>: addi r1,r1,80 <+92>: lwz r3,196(r30) // load WID
How do I know that WindowData+196
is the CoreGraphics WID? I used Quartz Debug to look at the window list for a sample app. The app only had one window, and the listed WID matched.
<+96>: mr r4,r29 // load new title <+100>: lwz r0,8(r1) <+104>: lmw r28,-16(r1) // tear down stack frame <+108>: mtlr r0 <+112>: b 0x92b58ce4 <dyld_stub_CoreDockSetItemTitle>
Note that we're tearing down the stack twice. In this case we're tail calling CoreDockSetItemTitle
so that it's as if our caller called them directly. This is equivalent to the code return CoreDockSetItemTitle(wid, newTitle)
. Note from the setup of r3
and r4
that we can deduce the parameter types. Can we figure out the return type, though? Not really. The calling code ignores it, so we can ignore it too.
<+116>: addi r1,r1,80 <+120>: li r3,0 <+124>: lwz r0,8(r1) <+128>: lmw r28,-16(r1) <+132>: mtlr r0 <+136>: blr
x86
<+0>: push %ebp // make stack frame <+1>: mov %esp,%ebp <+3>: sub $0x28,%esp <+6>: mov %ebx,-0xc(%ebp) // save %ebx <+9>: call 0x92e4bbe4 <+14> <+14>: pop %ebx // IP > %ebx
We're doing the same trick here to get the PC into a register and I'm a bit stumped as to why. From what I know, the x86 has PC-relative addressing, and surely there's got to be a better way to get the PC into a normal register. Right?
<+15>: mov %esi,-0x8(%ebp) // save %esi <+18>: mov 0x8(%ebp),%esi // WindowData > %esi <+21>: mov %edi,-0x4(%ebp) // save %edi
This almost looks like it was compiled by a different compiler. In the previous function, edi
and esi
are pushed, and then the stack pointer dropped. Here, we create the stack space and then move the contents of three registers (edi
, esi
, and ebx
). I suspect that things changes once we also have to save ebx
, though I don't know why.
<+24>: mov %esi,%eax // %esi (WindowData) > %eax <+26>: call 0x92e4bc40 <_Z15GetTitleForDockP10WindowData>
Whoa. If we're calling a function we need to set the parameter via stack-relative addressing off esp
. What's going on here?
The point of an ABI is that it's a documented way for functions to call each other. But if a function, say GetTitleForDock(WindowData*)
, is a short one that's not public and is only used under controlled circumstances, why worry about setting up the stack? In this particular case, GetTitleForDock
happens to be a nine-instruction routine. Not worth the hassle of a stack frame, so it's reasonable to pass in the one parameter in eax
.
<+31>: cmpb $0x0,0xd51a36c(%ebx) // test initialization boolean <+38>: mov %eax,%edi // window title > %edi <+40>: jne 0x92e4bc0c <+54> // if initialized, skip <+42>: call 0x92df9fe0 <_Z24RegisterAsDockClientPrivv> // else initialize <+47>: movb $0x1,0xd51a36c(%ebx) // and set flag as being initialized <+54>: mov %edi,0x4(%esp) // new title (param 2) <+58>: mov %esi,(%esp) // WindowData (param 1) <+61>: call 0x92e4bc52 <SyncPlatformWindowTitle> <+66>: xor %eax,%eax // clear %eax (noErr?) <+68>: testb $0x2,0x159(%esi) // test flag (WindowData + 0x159) (minimized?) <+75>: je 0x92e4bc35 <+95> // if not minimized, skip this step <+77>: mov %edi,0x4(%esp) // new title (param 2) <+81>: mov 0xc4(%esi),%eax // (WindowData + 0xC4) WID <+87>: mov %eax,(%esp) // (param 1) <+90>: call 0xa0a52ad1 <dyld_stub_CoreDockSetItemTitle> <+95>: mov -0xc(%ebp),%ebx <+98>: mov -0x8(%ebp),%esi <+101>: mov -0x4(%ebp),%edi <+104>: leave <+105>: ret
Conclusion
Yes, x86 assembly sucks. Having only two parameters rather than three for an opcode is a pain. Having only six general-purpose registers for use instead of twenty or so is a real pain.
But really, come on. You're not writing it. You're reading it.
It's compiler-generated. Nothing fancy.
Hold your horror. x86 isn't that bad.
Comments
"Endian little hate we."
On the plus side, the 64-bit ABI is much better, having a few more real registers to work with and the mess that is x87 is often avoided too.
Posted by: alexr | May 23, 2009 6:39 PM
Note that with gcc a bool on PPC is four bytes by default; the lbz is more likely dealing with a Boolean.
Posted by: Ned Holbrook | May 29, 2009 1:14 AM
Ned: excellent point. Fixed for the Google Mac Blog copy of the article. Thanks!
Posted by: Avi | June 3, 2009 2:14 PM
Hello Avi, could you help me with some CoreDock hacking? Is there an email address that I can contact you at? Thanks!
Posted by: Edward | September 25, 2009 4:36 PM
Hi there, just became aware of your blog through
Google, and found that it is really informative.
I'm gonna watch out for brussels. I will be grateful if you continue this in future. Numerous people will be benefited from your writing. Cheers!
Posted by: Mumbai Law Firm in Dubai | June 4, 2013 10:58 AM
Hi, constantly i used to check website posts here in the early hours in the dawn, since i enjoy
to find out more and more.
Posted by: Personal Injury Lawyers in Dubai | June 15, 2013 2:13 PM
Way cool! Some very valid points! I appreciate you writing this
write-up and also the rest of the website is also very good.
Posted by: Dubai Banking Law Firms | June 15, 2013 2:13 PM
What's up colleagues, how is the whole thing, and what you would like to say concerning this post, in my view its truly
amazing in support of me.
Posted by: newegg | December 13, 2013 5:15 PM
Ηello theгe! I сould have sworn I've been to your blоg bеfore but after looking at a few of the
articles I realized it's neω to mе. Nonethеless, I'm definitely delighted I discоvered
it and I'll be boоk-marking it and cgecking baсk regularly!
Posted by: incorporate company dubai | December 26, 2013 4:10 PM
The nectar solution can be created at home, by using a ratio of four years old parts water to
at least one part white cane sugar. However, there are many men who simply
would rather wear a simple tattoo on the neck. Hummingbird tattoos sneaking through low-cut
jeans is both mysterious and tempting.
Posted by: butterfly and hummingbird attracting plants | September 29, 2015 6:16 AM
What's up, yup this post is genuinely good and I have learned lot of things from it about blogging.
thanks.
Posted by: create google plus account | October 22, 2015 12:26 PM
Market your site content (blog) with your Google+ Circles
and in your email campaigns. Orkut is still equipped with millions of users,
mainly from Brazil and India, now increasing active users of
the platform or in comparison to Facebook.
The general concept of 'huddle' is usually to come closer or gather
together.
Posted by: google plus and facebook | February 16, 2016 5:05 AM