Intel assembler on Mac OS X
Apr 25, 2007 • uliwitness
I’ve always wanted to learn another assembler, and with one of my colleagues being a real assembler guru, and the Intel reference books on my bookshelf, and the Intel switch just behind us, I thought this would be a good opportunity to finally get going with x86 assembler.
Now, assembler programming under Mac OS X isn’t quite as well documented as one would wish. There’s no tutorial that I could find (lots of tutorials for Linux and Windows, but none for Mac OS X yet). This won’t be one either, but rather this is a blog posting of me sharing what I found out about assembler on OS X, and is probably only useful to someone who already knows some assembler, but just doesn’t know Intel on Mac OS X. My main approach is to compile C source code into assembler source files using GCC. Then I can look at that code and find out what assembler instructions correspond to what C command. If all of this turns out to be correct and I should happen to have loads of time on my hand, I may still go out there and turn this into a decent tutorial.
The basics are pretty simple
Now, the underscore in front of «main» is a convention in C, so just accept it. When you enter the _main function, the return address (i.e. the instruction where the program will continue after the function has finished, aka «back pointer») has already been pushed on the stack, taking up 4 bytes. We also save the base pointer (the point where our caller can find its parameters on the stack) to the stack, and set it to the current stack pointer (which is where our parameters are). That takes another 4 bytes, so we have 8 bytes now. Since the stack should be aligned on 16 bytes before you can make a call to another function, we subtract another 8 from the stack pointer, which pads out the stack (we could also just do two «pushl $0» for the same effect). If we used any local variables, we would use this opportunity to subtract more for them.
Now comes the actual body of our function. What we do is simply return 0. This is done by stuffing 0 in the eax register.
Finally, we have the tail end of our function, which calls leave (which cleans up by restoring our caller’s base pointer and stack pointer) and then call ret, which pops the return address off the stack and continues execution there.
Calling a local function
Calling a function is fairly simple, as long as it’s a local one right in the same file as ours. In that case, what you do is you first declare that function:
«nop» is a do-nothing instruction I just inserted here to show where doSomething’s code would go. That’s pretty easy. You just write the function, push the parameters on the stack and use call to jump to the function, and that will take care of pushing the return address and all that. The only tricky thing is passing the parameters. You have to pad first, and then push (or mov, in our case) the parameters in reverse order (i.e. #1 is at the bottom of the stack, #2 above it etc.). That’s because otherwise the function being called would have to skip the padding. Well, could be worse.
Accessing parameters
To acess any parameters, you address relative to the base pointer. The value immediately at the base pointer is generally your caller’s base pointer and the return address, so you need to add 4 + 4 = 8 bytes. Yes, since the stack starts at the end of memory and grows towards the beginning, and you subtract from the stack pointer to make it larger, you need to add to the stack pointer to find something on the stack. The same applies to our base pointer, of course:
Would store your second parameter in eax and then add the first parameter to it, leaving the result in eax, where it’s ready for use as a return value. Note the ##(foo) syntax, which adds the number ## to the pointer foo. This is register-relative addressing.
An added benefit of this is that you can actually pass more parameters to a function than it knows to handle, and it will just ignore the rest.
Fetching data
To access data (e.g. strings), it gets trickier. You declare data like the following:
So, you add a .cstring section at the top of the function, and in that you declare a label and use the .ascii keyword to actually stash your string there. So far, so good, there’s only one problem:
All data manipulation is done using absolute addresses. But we don’t know at what position in memory our program will be loaded. Labels aren’t absolute addresses, they get compiled into relative offsets from the start of our code. So, how do we find out at which absolute address our string myHelloWorld is? Well, the trick MachO uses is that it knows that our program will be loaded as one huge chunk. So, we know that the distance between any of our instructions in the code will always stay at the same distance to our string.
So, if we could only get the address of one instruction in our code that has a label, we could calculate the absolute address of our string from that. Now, look above, at our function call code. Notice anything? Our return address is an absolute pointer to the next instruction after a function call. So, all we need to do to get our address is call a function. When you assemble C source code, they call this helper function ___i686.get_pc_thunk.bx, which is quite a mouthful. Let’s just call it _nextInstructionAddress:
That’s what we call somewhere at the start of our code to find our own address. Note how I cleverly already added a label myAnchorPoint, which labels the instruction whose address we’ll get. Then we somewhere (e.g. at the bottom) define that function:
We don’t even bother aligning the stack or changing and restoring the base pointer. This simply peeks at the last item on the stack (the return address) and stashes that in register ebx. Then it returns (and obviously doesn’t call leave because we pushed no base pointer that it could restore).
Once we have this address in ebx, we can do the following to get our string’s address into a register, and from there onto the stack:
LEA means «Load Effective Address», i.e. take an address and stash it into a register. myHelloWorld-myAnchorPoint calculates the difference between our two labels, and thus tells us how far myHelloWorld is from myAnchorPoint. Since myHelloWorld is probably at the start of the program, e.g. at address 3 maybe, and myAnchorPoint further down, say at address 20, what we get is a negative value, e.g. -17. And xxx(%ebx) is how you tell the assembler that you want to add an offset to a register to get a memory address. ebx contains the address of myAnchorPoint, so what this does is subtract 17 from myAnchorPoint’s absolute address, giving us the absolute address of myHelloWorld! Whooo! And this mess is called «position-independent code».
Now, our call to LEAL loads a «Long» (which is 32 bits, i.e. the size of a pointer on a 32-bit CPU) and stashes it into register eax. And the movl call moves that long from our register into the last item on the stack, ready for use as a parameter to a function.
Calling a system function
Now, it’d be really nice if we could printf() or something, right? Well, trouble is, we don’t know the address of printf(). But this time it’s actually easy. We add a new section at the bottom of our code:
This is a new section named __IMPORT,__jump_table. It has the type symbols_stubs and the attributes self_modifying_code and pure_instructions. 5 is the size of the stub, and intentionally is the same as the number of hlt statements below.
This section is special, because when our code is loaded, the loader will look at it. It will see that there is an .indirect_symbol directive for a function named «printf», and will look up that function. Then it will replace the five hlt instructions, each of which is one byte in size, with an instruction to jump to that address (hence the self_modifying_code). We also added a label for each indirect symbol, which we name the same as the symbol, just with «_stub» appended.
So, to call printf, all you have to do now is push the string on the stack and then
Which will jump to _printf_stub and immediately continue to printf itself. And just to show you that you can have several such imported symbols, I’ve also included a stub for getchar. Now note that the system usually doesn’t name these symbols «_foo_stub», but rather «L_foo$stub» (yes, a label name can contain dollar signs. You can even put the label in quotes and have spaces in it. ). Same difference.
Okay, so that’s how much I’ve guessed my way through it so far. Comments? Corrections? If you want
PS — Thanks to John Kohr, Alexandre Colucci, Jonas Maebe, Eric Albert and Jordan Krushen, all of which helped me figure this out one way or the other. Thanks, guys!
Update: Added mention of how to actually access parameters.
Источник
Assembler 4+
Quote-Unquote Apps
Снимки экрана
Описание
Assembler is the remarkably useful utility for joining together text files — including Fountain, Markdown and .csv. If you have a bunch of little files and need to make a big one, this is your app.
Assembler saves you the hassle of lots of copy-and-pasting, or obscure terminal commands.
1. Drag-and-drop to add files. You can even add a folder at once.
2. Arrange files in order you want them assembled.
3. Click Save and you’ll get a brand-new file with all the pieces put together.
If you have Highland installed, you can even open the new file directly in the app.
Assembler is a godsend for screenwriters working in Fountain. Write your scenes separately, then combine them only when you need to.
For writers working in plain text or Markdown, Assembler makes it simple to combine sections and chapters.
If you find yourself working with .csv files — such as PayPal exports, or Kickstarter backer reports — Assembler makes it quick and easy to merge them into a single file.
Конфиденциальность приложения
Разработчик Quote-Unquote Apps не сообщил Apple о своей политике конфиденциальности и используемых им способах обработки данных.
Нет сведений
Разработчик будет обязан предоставить сведения о конфиденциальности при отправке следующего обновления приложения.
Источник
Ассемблер для mac os
Assembler on a Mac? Yes We Can!
A list of sample Assembly programs that demonstrate how to program using machine code instructions. Each program in this project is well self documented. Use this README.md to get started, then jump to ASSEMBLER.md to go further.
Program | Description |
---|---|
hello.s | Have a look at this first Hello World assembly code |
formatstring.s | Display a formatted string on screen |
parameters.s | It shows usage of parameters when calling a program or function |
operations.s | Sample program to debug of common instructions |
registers.s | Assembly program to show addressing of registers |
Please note that you need to have the unix as (Assembler) and ld (Linker) utilities to use the sample programs included in this project. These utilities are automatically installed via the command line developers tools included in Xcode. The easiest way to install them is to open terminal and run the ld command, if you don’t have them you should get a prompt to start install.
Use the included shell script utility asm.sh to compile, link and run assembly code. Format is:
This utility will automatically call as to compile an assembly source code into an object code program (.o). It will then call the linker ld to create an executable from the object code. As an example the following command will compile, link and run hello.s assembly code:
This will produce hello.o object code and hello executable. This last one can also be directly started from the command line:
Important Note
You may need to specify which version of Mac OS X you are using in asm.sh script :
Debug assembly code
You can use lldb to debug an executable program. For example the following command will start a debug session:
Debug Command | Description |
---|---|
b main | Set a breakpoint at the start symbol (main) of a program |
run | Run code till a breakpoint is found |
run par1 par2 | Run code using input parameters |
b 0x1f8d | Set breakpoint at line number 0x1f8d |
s | Step into instruction (i.e. step into a call statement) |
n | Step over instruction (i.e. step over a call statement) |
c | continue execution till a breakpoint is found |
q | Terminate execution and exit lldb |
register read | Show content of main registers (abbreviated re r ) |
re read esp eat | Show content of esp and tax register |
re write eax 0xF12F | Write content to tax register |
memory read 0xbffffb8c | Read content of memory address |
x 0xbffffb8c | Same as memory read , abbreviated form |
x —count 100 0xbffffb8c | Read 100 bytes from memory address 0xbffffb8c |
watch set e 0x1f67 | watch changes at memory address (breakpoint) |
gui | When entered after run show debugger in a GUI |
TIP: After entering a run command in lldb try using the gui command as well. :
Some handy shell commands
Command | Description |
---|---|
hexdump -C FileName | Hexadecimal dump of FileName. Tip: pipe using head -n10 |
gcc -S prg.c -m32 -Os | Generate assembly code from a C program |
lldb Program | Debug an executable program |
Dive deeper in Assembler by reading ASSEMBLER.md.
A continuos learning path where passion is the drive.
About
Assembler on your Mac? Yes We Can ! A quick tutorial together with a bunch of sample assembler programs for the Mac.
Источник