Author Topic: Dynamically compiled functions (Read 7920 times)

Mike Lobanovsky · « **Reply #15 on:** July 28, 2014, 09:32:25 PM »

Regretfully this isn't so, Charles.

Please have a look at the picture below. This is a failed attempt to run your script in a JIT mode pressing F5 in the IDE with DEP on:

1. MS Data Execution Prevention notifying me that gxo2 has been closed by the system to protect my computer.

2. An invitation to add gxo2 to the DEP exception list (i.e. allow it to execute its data as code).

3. A usual MS apology for closing gxo2.exe in case I reject adding it to the exception list as per Item 2.

FBSL avoids this mess by allocating heap buffers for its DynAsm and DynC JIT code using the VirtualAlloc/VirtualProtect/VirtualFree WinAPI family.

.

Mike Lobanovsky · « **Reply #16 on:** July 28, 2014, 10:57:11 PM »

Charles,

Can you please move both Aurel's question and my answer to the Bytecode Interpreter Concept thread, please?

Hi Aurel,

No, the bytecode commands (op_halt, op_push_int, etc) are actually stored in the code_arr[] array together with the variables' values according to the keywords and operators (sym_print, sym_while, sym_or, sym_and, etc) and literal numeric and string values as they are seen immediately by the parser. The notion of a particular "variable" and its "type" (only integer for now) exists only as long as its declaration is being evaluated by the parser() function, and its storage space is reserved as a location (a "cell") in the same code_arr[] array (or out of it, if you like). They are referred to by the interpreter() function later on by their "index" number in such bytecode commands as push_int_var X and store X where X is the "index" of the location (n=1, lim=2, k=3, p=4).

See Code listing... in your console also for such bytecode command as push_int X which means "push literal value X" against push_int_var X which means "push value stored at location that has index number X". store X always means "store result of current operation at location that has index number X".

The interpreter() function doesn't know anything about either "variables" or their "types". It deals only with the bytecoded commands and the nearby literal numeric and string pointer values stored in the same code_arr[] array. It also uses numeric literal values stored at the locations indexed 1 to 4 or saves literal results of numeric operations in them.

It "pushes" the literals into the auxiliary "stack" array stack[] and applies arithmetic operations to its elements that have just been "pushed". Each bytecode corresponds to a certain arithmetic or print operation that is applied by the interpreter to the stack[] array elements "pushed" into it. It also retrieves and stores values from/to the locations (former "variables") indexed 1 to 4. In other words, the interpreter() function is the code executor function.

Your own interpreter uses 3 arrays. The toy interpreter uses only one "interleaved" bytecode/literal array code_arr[] and a small auxiliary "stack" array stack[] to "de-interleave" (separate) the literals from the bytecode/literal code_arr[] array.

The toy interpreter uses a lot fewer resources so far than your own implementation.

Charles Pegge · « **Reply #17 on:** July 29, 2014, 03:15:49 AM »

Mike,

The forum system does not seem to provide a way of moving individual messages unfortunately.

Thanks for your info on DEP and VirtualAlloc.. In JIT mode, all is Data Execution

Does this have the correct protocol for making dynamic code acceptable to DEP ? If so then I will use it for Oxygen JIT compiling.

Code: [Select]

'http://msdn.microsoft.com/en-us/library/windows/desktop/aa366781(v=vs.85).aspx
'http://msdn.microsoft.com/en-us/library/windows/desktop/aa366887(v=vs.85).aspx

extern lib "kernel32.dll"
! VirtualAlloc   (sys lpAddr,dwSize,flAllocationType,flProtect) as sys 'lpAddr
! VirtualProtect (sys lpAddr,dwSize,flNewProtect,*lpflOldProtect) as sys 'bool
! VirtualFree    (sys lpAddr,dwSize,dwFreeType) as sys 'bool
end extern

% MEM_COMMIT             0x1000
% MEM_RESERVE            0x2000


'Memory Protection Constants
'http://msdn.microsoft.com/en-us/library/windows/desktop/aa366786(v=vs.85).aspx

% PAGE_EXECUTE           0x10
% PAGE_EXECUTE_READ      0x20
% PAGE_EXECUTE_READWRITE 0x40

'http://msdn.microsoft.com/en-us/library/windows/desktop/aa366892(v=vs.85).aspx

% MEM_DECOMMIT           0x4000
% MEM_RELEASE            0x8000

sys p,size,prot

size=0x1000
p=VirtualAlloc 0,size,MEM_COMMIT,PAGE_EXECUTE_READWRITE

byte b at p

b={0xb8,1,2,3,4,0xc3}

VirtualProtect p,size,PAGE_EXECUTE_READ,prot

print hex call p

VirtualFree p,size,MEM_DECOMMIT

[/code

pber · « **Reply #18 on:** July 29, 2014, 06:12:26 AM »

Quote from: Mike Lobanovsky on July 28, 2014, 07:51:12 PM

In this case the developer may use the VirtualAlloc/VirtualProtect/VirtualFree WinAPI's with a required set of readable/writable/executable attributes.

Ok, now I see.
Thanks Mike for ALL your infos, I'm in debt.

Smalltalk allows the program to change op-codes on-the-flight.
Java doesn't not, but it allows to provide your security policy via ClassLoaders.

Same thing in Win32:
the guru you hires still needs a security layer on top of Win32,
i.e.: he must be sure the executables are not patched before loading them.

Mike Lobanovsky · « **Reply #19 on:** July 29, 2014, 11:49:08 AM »

Yes Charles,

1. This effectively covers the entire protocol recommended by MS to be followed in order to bypass the DEP restrictions. Just one more very subtle issue is that there's also a device in a Pentium CPU called "the instruction cache" that can buffer repeatable instruction sequences e.g. such as those found in short loops. So if such a sequence is currently located in the instruction cache while the associated memory page's access attributes are being changed by a call to VirtualProtect(), the access attributes of cached instruction context will not change. To resolve this situation, a call to FlushInstructionCache() is recommended before modifying the attrbutes of a page that currently has an EXECUTE attribute set.

2. While the protocol is correct, what you have in your script is an attempt to cope with the problem on the client side of the language interface. This most probably won't work and actually your new script is blocked by DEP just as its predecessor was.

This may mean that the current implementation of gxo2.exe still uses some internal permutations that fall outside the scope of memory area preallocated with VirtualAlloc(EXECUTE_READWRITE). The protocol should also be implemented within the gxo2 engine proper and its sources should ensure compilation/recompilation is performed strictly within the preallocated executable memory area.

FBSL uses memory pools for all its internal allocations so MEM_COMMIT/MEM_DECOMMIT calls can appear only at the pool level transparently to the user. Consequently, explicit client-side VirtualAlloc/VirtualFree need to be used very rarely and only in the user code that would operate on the user pointers outside the scope of internal memory pools, for example, when self-modifiable dynamically recompiled user functions are being developed.

3. There's also a simpler and dirtier hack that uses standard memory allocations (from custom-coded allocators or general-purpose malloc alike). It simply uses VirtualProtect(PAGE_EXECUTE_READWRITE) to change memory attributes only once after compilation/recompilation and before the dynamic code is to be executed for the first time. To tell you the terrible military secret, that's exactly what FBSL uses in its DynAsm and DynC JIT compilers.

Mike Lobanovsky · « **Reply #20 on:** July 29, 2014, 12:56:50 PM »

Hi Paolo,

Quote from: paolo on July 29, 2014, 06:12:26 AM

Smalltalk allows the program to change op-codes on-the-flight.

AFAIK Smalltalk is an interpretative language. And I'm stressing once again that interpreters, both sequential and bytecoded ones, don't have any problems with DEP. This is because the actions they are supposed to perform when the program runs are predefined in the functions hardcoded in their engine (i.e. the language binary), specifically in the engine's code section(s). This code is never modified once the engine binary is loaded in memory.

What we usually call "code" for an interpreter is not actually executable machine code at all. It is just a sequence of readable/writable (i.e. modifiable) data, either a verbatim script like in thinBasic or a sequence of bytecodes (i.e. numeric literals representing human words in the initial verbatim script) like in FBSL's BASIC or Smalltalk or Java. This data sequence is exactly what is called "an interpreter program". It simply tells the interpreter which of its predefined functions to execute to perform the actions needed as the interpreted "program" runs.

JIT compilers like O2 and FBSL's DynAsm and DynC do generate genuine executable machine code on the fly. This is why they may be succeptible to DEP restrictions if Charles Pegge and Mike Lobanovsky don't take precautions to guard their respective users against Microsoft's encroachment on their civil rights.

Quote

Java doesn't not, but it allows to provide your security policy via ClassLoaders.

Since the interpreted "program" is simply readable/writable data but not executable machine code, there is nothing that prevents us theoretically from changing it whenever we want. Smalltalk does allow it. So does FBSL's BASIC through its ExecLine() function which allows the user to modify the existing "code" already loaded in memory or to add new "code" to the "code" that's already running. Java however doesn't allow it presumably to preclude the development of malware "code" from scratch or the modification of "code" already running, for ill-intended purposes.

Quote

the guru you hires still needs a security layer on top of Win32,
i.e.: he must be sure the executables are not patched before loading them.

Actually there are a lot of methods for the developer to protect their creation against undesirable or ill-minded modification. For example, the integrity of such parts in an FBSL executable as its "code" (script) stub or user resource data can be protected with checksums. The entire .exe file is also protected with its own checksum written into its standard PE header though of course this "protection" is the weakest of all and it can be easily bypassed by malware in a carelessly written virus-unaware piece of user code.

Oxygen Basic

News:

Author Topic: Dynamically compiled functions (Read 7920 times)

Mike Lobanovsky

Re: Dynamically compiled functions

Mike Lobanovsky

Re: Dynamically compiled functions

Charles Pegge

Re: Dynamically compiled functions

pber

Re: Dynamically compiled functions

Mike Lobanovsky

Re: Dynamically compiled functions

Mike Lobanovsky

Re: Dynamically compiled functions