Author Topic: [SOLVED] Help with 64-bit Asm Needed! (Read 4335 times)

Mike Lobanovsky · « **on:** September 09, 2014, 05:47:20 AM »

Hi Charles,

I'm pretty sure I want to start migrating FBSL to 64 bits. As I said earlier, its entire engine is based on the trick that you described in your message here. The appropriate procedures are coded in the FBSL project as direct assembly inlines.

I'm also pretty sure that my knowledge of 64-bit assembly is way below zero. Hence I'm in a bad need of your expert advice on a few practical matters described below.

1. I want to save the current rsp in a temp var for further restoration after the function call. What should the proper size of this var be under 64 bits?

2. I want to be able to push my args to the 64-bit function stack. I'm currently doing it in a mixed for() loop in C/assembly. I could've used memcpy() instead but that would be way too slow to call yet another function in each call to my own one. So:

2.1. What is the structure of the function call stack under 64 bits?
2.2. What are the stack sizes of 64-bit function call args and where are they located in the stack and in what order?
2.3. Where is the return address located and what is its size in the call stack?
2.4. Are there any "dark areas" in the function call stack that are reserved for the system and thus shouldn't be overwritten with my call stack construction methods?

3. A 32-bit return instruction ret N has its N sized as a 16-bit quantity. What is its proper size under 64 bits?

I'm writing all this in GCC's inline assembly. It can be written in either Intel or AT&T style. If you see clearly now how the function call engine works in FBSL, I would greatly appreciate your verbatim 64-bit assembly code as an answer. If not then simple answers to the above questions will largely suffice.

Thanks a lot in advance!

Charles Pegge · « **Reply #1 on:** September 09, 2014, 06:49:21 AM »

Hi Mike

Quote

1. I want to save the current rsp in a temp var for further restoration after the function call. What should the proper size of this var be under 64 bits?

It should be stored as a 64bit pointer type - presumably long long *

Quote

2. I want to be able to push my args to the 64-bit function stack. I'm currently doing it in a mixed for() loop in C/assembly. I could've used memcpy() instead but that would be way too slow to call yet another function in each call to my own one. So:

Quote

2.1. What is the structure of the function call stack under 64 bits?

2.2. What are the stack sizes of 64-bit function call args and where are they located in the stack and in what order?

2.3. Where is the return address located and what is its size in the call stack?

Assuming you want to use the MS64 bit calling convention:

The stack is 8 bytes wide - even for longs / dwords etc

Create a stack frame of at least 32 bytes aligned down to 16 bytes. If you fail to align rsp to 16 bytes - you will be exiled to Crashbania

sub rsp,48 'will take 5 or 6 params

The first 4 params are passed in registers in this order: RCX, RDX, R8, R9

But Direct floats must be passed in the corresponding 4 SIMD Registers XMM0,XMM1,XMM2,XMM3

The lower 32 bytes of your stack frame are used by the function as a spill zone for these registers.

If there are more than 4 params, store them in [rsp+32] [rsp+40], [rsp+48] etc

The call is made in the usual way - pushing the return address onto the stack. This, of course puts the stack pointer out of 16 byte alignment, so it is one of the duties of the function prologue to ensure to remedy this situation.

After making the call, release your stack frame:

add rsp,48

Values ar returned in the RAX register, except for floats which are returned in SIMD register XMM0.

Quote

2.4. Are there any "dark areas" in the function call stack that are reserved for the system and thus shouldn't be overwritten with my call stack construction methods?

Only the spill zone

Quote

3. A 32-bit return instruction ret N has its N sized as a 16-bit quantity. What is its proper size under 64 bits?

ret n is not used - as with cdecl, but it remains 16 bits.

These rules can be vexatious, so for internal functions, you may prefer to use stdcall or cdecl or something entirely customised, as long as the stack pointer is kept in 16 byte (128 bit alignment) before making calls.

Mike Lobanovsky · « **Reply #2 on:** September 09, 2014, 09:48:27 AM »

Thanks for your input, Charles!

I do not want to use MS64. I need only STDCALL and I will use it for internal and user function calls, DynC and DynAsm interface, and external DLL calls. In user-defined STDCALL calls to CDECL DLL's, the correct (in fact, reversed) order of arguments will be the user's own responsibility (because FBSL doesn't make use of function prototypes in any form) while stack balancing will be done by restoring the rst pointer from the var where it was stored before the call. That's how it works in 32 bits and that's how I'd like to see it implemented in 64 bits too, if at all possible.

Since this mechanism should be universal for all the functions -- internal, user-defined, user-defined Asm and C, and external in 3rd-party DLL's -- any custom schemes are out of the question. Everything has to be standardized to STDCALL because this is where I can't do anything about the existing calling convention that the 64-bit Windows system DLL's like kernel, user and gdi are expecting.

<SNIP> <SNIP> (for copyright reasons )

So finally,

1. Are 64-bit system and ordinary user DLL's called by STDCALL/CDECL conventions?

2. Is my scheme described above correct as far as the structure of 64-bit function stack is concerned?

Charles Pegge · « **Reply #3 on:** September 09, 2014, 10:24:55 AM »

Hi Mike,

For DLLs / MS calls you must use MS64. This is the only calling convention used on MS 64 bit platforms. So if you want consistency throughout ...

http://www.youtube.com/watch?v=4Swevi4g8x0

Here is an ASM MessageBox example

Code: [Select]

$ FileName "t.exe"
#include "..\..\inc\RTL64.inc"

Declare Function MessageBox Lib "user32.dll" Alias "MessageBoxA" (ByVal hwnd As Long, ByVal lpText As String, ByVal lpCaption As String, ByVal wType As Long) As Sys

zstring tit[]="64Bit OxygenBasic"
zstring msg[]="Hello World!"

sub  rsp,32
mov  r9,  0       
lea  r8,  tit   
lea  rdx, msg   
mov  rcx, 0       
call MessageBox
add  rsp, 32

Mike Lobanovsky · « **Reply #4 on:** September 09, 2014, 10:54:12 AM »

Charles,

You're killing my swan song on the rise. $:-\$

The message box example would be too simple, I'm afraid. Can I ask you for a DLL call with at least 5 function arguments and a usable return value, please?

Charles Pegge · « **Reply #5 on:** September 09, 2014, 11:28:39 AM »

Very well, I will now reveal the ghastly details using a complex example. 64bit calling at its nastiest:

Code: [Select]

% FileName "t.exe"
include "$\inc\RTL64.inc"

extern

#show function f(double d1,d2 ,sys i3,i4,i5,single d6) as double
#show end function

end extern

double d

#show d=f(1.0, 2.0, 3, 4, 5, 1.0)

When you run it, the asm for the prolog, epilog and call will be listed in message boxes. You can try different param types and see how o2 handles them.

PS: Musical accompaniment

http://www.youtube.com/watch?v=ICsZE17MHwA

Mike Lobanovsky · « **Reply #6 on:** September 09, 2014, 11:49:28 AM »

Charles,

Your information is priceless, your message boxes, awesome, and your visuals and soundtrack, irreproachable as always.

I'm marking the topic as SOLVED.

Thank you very much again!

Charles Pegge · « **Reply #7 on:** September 09, 2014, 11:58:05 AM »

A pleasure Mike.

MS64 is for the Daleks, and Linux64 (AMD) is for the Cybermen. They fight each other for global domination!

http://en.wikipedia.org/wiki/X86_calling_conventions

Quote

x86-64 calling conventions[edit]
x86-64 calling conventions take advantage of the additional register space to pass more arguments in registers. Also, the number of incompatible calling conventions has been reduced. There are two in common use.

Microsoft x64 calling convention[edit]
The Microsoft x64 calling convention[9] [10]is followed on Microsoft Windows and pre-boot UEFI (for long mode on x86-64). It uses registers RCX, RDX, R8, R9 for the first four integer or pointer arguments (in that order), and XMM0, XMM1, XMM2, XMM3 are used for floating point arguments. Additional arguments are pushed onto the stack (right to left). Integer return values (similar to x86) are returned in RAX if 64 bits or less. Floating point return values are returned in XMM0. Parameters less than 64 bits long are not zero extended; the high bits are not zeroed.

When compiling for the x64 architecture in a Windows context (whether using Microsoft or non-Microsoft tools), there is only one calling convention — the one described here, so that stdcall, thiscall, cdecl, fastcall, etc., are now all one and the same.

In the Microsoft x64 calling convention, it's the caller's responsibility to allocate 32 bytes of "shadow space" on the stack right before calling the function (regardless of the actual number of parameters used), and to pop the stack after the call. The shadow space is used to spill RCX, RDX, R8, and R9,[11] but must be made available to all functions, even those with fewer than four parameters.

For example, a function taking 5 integer arguments will take the first to fourth in registers, and the fifth will be pushed on the top of the shadow space. So when the called function is entered, the stack will be composed (in ascending order) the return address, by the shadow space (32 bytes) followed by the fifth parameter.

In x86-64, Visual Studio 2008 stores floating point numbers in XMM6 and XMM7 (as well as XMM8 through XMM15); consequently, for x86-64, user-written assembly language routines must preserve XMM6 and XMM7 (as compared to x86 wherein user-written assembly language routines did not need to preserve XMM6 and XMM7). In other words, user-written assembly language routines must be updated to save/restore XMM6 and XMM7 before/after the function when being ported from x86 to x86-64.

System V AMD64 ABI[edit]
The calling convention of the System V AMD64 ABI[12] is followed on Solaris, GNU/Linux, FreeBSD, Mac OS X, and other UNIX-like or POSIX-compliant operating systems. The first six integer or pointer arguments are passed in registers RDI, RSI, RDX, RCX, R8, and R9, while XMM0, XMM1, XMM2, XMM3, XMM4, XMM5, XMM6 and XMM7 are used for floating point arguments. For system calls, R10 is used instead of RCX.[12] As in the Microsoft x64 calling convention, additional arguments are passed on the stack and the return value is stored in RAX.

Registers RBP, RBX, and R12-R15 are callee-save registers; all others must be saved by the caller if they wish to preserve their values.[13]

Unlike the Microsoft calling convention, a shadow space is not provided; on function entry, the return address is adjacent to the seventh integer argument on the stack.

Brian Alvarez · « **Reply #8 on:** December 26, 2018, 02:35:23 PM »

This music gives me chills and bristles my skin... Thanks Charles.
This example is outstanding, as im implementing inline assembly.

Also listen at this (specially the last couple minutes!!):
https://www.youtube.com/watch?v=pSNeoC_AzYY

Oxygen Basic

News:

Author Topic: [SOLVED] Help with 64-bit Asm Needed! (Read 4335 times)

Mike Lobanovsky

[SOLVED] Help with 64-bit Asm Needed!

Charles Pegge

Re: Help with 64-bit Asm Needed!

Mike Lobanovsky

Re: Help with 64-bit Asm Needed!

Charles Pegge

Re: Help with 64-bit Asm Needed!

Mike Lobanovsky

Re: Help with 64-bit Asm Needed!

Charles Pegge

Re: Help with 64-bit Asm Needed!

Mike Lobanovsky

Re: Help with 64-bit Asm Needed!

Charles Pegge

Re: [SOLVED] Help with 64-bit Asm Needed!

Brian Alvarez

Re: [SOLVED] Help with 64-bit Asm Needed!