Oxygen Basic
Programming => Example Code => General => Topic started by: Charles Pegge on July 15, 2014, 07:31:06 AM
-
Dynamically compiled functions can now use 'link' syntax to create a function address table for the host. This is consistent with primary compiling (in thinBasic - linking for calls from thinbasic)
function ffs( string v) as string, link pf[2]
UPDATE:
http://oxygenbasic.org/o2zips/Oxygen.zip
'#FILE "T.EXE"
'=================================
'Dynamically compiled mini library
'=================================
function Build(string src,sys *pf) as sys
=========================================
static sys base,libr
static string er
base = ebx
libr = compile src
er=error
if er then
print "Dynamic " er
libr=0
else
call libr 'initialise library
indexbase 0
end if
return libr
end function
'
string funlib=quote """
'
function ffi(sys v) as sys, link pf[1]
return v*3
end function
'
function ffs( string v) as string, link pf[2]
return v+" "+v
end function
'
function fff( float v) as float, link pf[3]
return v*2
end function
'
"""
'TEST
'====
sys a
sys p[3]
a=build(funlib,p)
if a then
! fi (sys v) as sys at p[1]
! fs (string v) as string at p[2]
! ff (float v) as float at p[3]
cr=chr(13,10)
print fi(14)+cr+ff(1.25)+cr+fs("Qwerty")
freememory a
end if
-
Thanks Charles ,
Gives me some ideas -- (it may be possible to avoid foreign function calls in (atleast) NewLisp -- see attachment )
I finally could allocate contineous memory within NewLisp that can be indexed in the usual way. (huge c-strings containing NULL characters). Don't need foreign calls to read/write data any more ...
NewLisp is a very fast interpreter (depending on the application 5-2x slower than native code ) some JIT code may be welcome.
Such a combination would be perfect -- as for Lisp , the inventor Mr McCarthy was not sure a Lisp source could be compiled completely (Lisp was born in 1958) and nowadays the people from NewLisp and PicoLisp do not make any attempts in this way.. Maybe Mr Burger from PicoLisp said it correctly : compiled Lisp is no Lisp any more.
(i have the same experience - in a recent program i wrote formulae morphing themselves depending on a situation -- ran fine interpreted, but upon compiling , the machine told me it could not do such things -- (but then in Common Lisp , definitions can be compiled or not -- resulting in something as a tB+Oxygen situation -- but with dynamically compiled functions ... 8)
best Rob
.
-
Hi robbek,
Maybe Mr Burger from PicoLisp said it correctly : compiled Lisp is no Lisp any more.
why something should continue to be itself once compiled?
Does "This is not lisp" stands for it's not a sexp (or maybe it's not a list)?
Or there is something more I do not see?
-
Hi Paolo,
Quoting him :
"Only an interpreted Lisp can fully support such "Equivalence of Code and Data". If executable pieces of data are used frequently, like in PicoLisp's dynamically generated GUI, a fast interpreter is preferable over any compiler. "
Attached something that may make it more clear -- in an interpreted Lisp it is possible to assign the elements of definitions as if it were lists and change them dynamically . Here every time the function is called it changes itself (some very clever things are possible this way .. )
As for interpreted vs compiled lisp - PicoLisp and NewLisp run very fast -- I made a compair (sorting a simple-array of 100000 integers and making the sum of the elements )
(i)= interpreted / (c) = compiled
CLisp (i) 390 mSec
CLisp (c) 90 mSec Bytecode
Clozure CL (native code)
(i) 63 mSec
(c) 47 mSec
Gnu CL (C obj. file)
(i) 190 mSec
(c) 50 mSec
Steel Bank CL (native optimized code)
(c) 31 mSec
NewLisp (i)
using setq 66 mSec
using incf 62 mSec
using apply on a sequence 31 mSec
LispWorks (Personal version -- Deep Space 1 used a Lispworks product on board)
(i) 2484 mSec (no GC interaction ?!)
(c) 62 mSec
Racket Scheme (bytecode + GNU Lightning JIT)
with lists : 62 mSec
Vector 32 mSec
Vector->List + apply 16 mSec
best Rob
.
-
homoiconicity, you're right.
A compiled lisp should implement a different reader
for the compiled code. After all: ascii bytes are not code nor data.
-
Hi Paolo,
If you should consider trying NewLisp , I wrote the Japi, OpenGL , GLU , GLUT bindings for it - just ask ,
Rob
-
A geometric surface explorer would be quite a useful tool - making use of dynamic compiling. All the components are available in Oxygen, including snapshots. It just needs stitching together:
My concept: (using bumpy metallic surface)
(http://www.oxygenbasic.org/o2pics/opengl/SurfaceExplorer1.jpg)
-
Hi Paolo,
If you should consider trying NewLisp , I wrote the Japi, OpenGL , GLU , GLUT bindings for it - just ask ,
Rob
hello RobbeK,
obviously I am not Paolo but I am interested in those bindings for NewLisp. :)
-
Hi Jack, (oops this message escaped me)
Running Win or Linux ? .. I wrote a 3D and 2D skeleton for Win32 , the Japi bindings are complete (and in this case I also have the Japi.so file somewhere -- it's even more complete as what comes with GNU CL (it has no progressbar and seems to be based in a very early Japi distribution ) ... it has a Japi-Primitives package ..
best Rob
-
.........
Quoting him :
"Only an interpreted Lisp can fully support such "Equivalence of Code and Data". If executable pieces of data are used frequently, like in PicoLisp's dynamically generated GUI, a fast interpreter is preferable over any compiler. "
............
Here every time the function is called it changes itself (some very clever things are possible this way .. )
............
Hi Rob,
"Mr Burger from PicoLisp" may be perfectly correct from a very technical point of view.
Compiled code goes into the process memory sections that are usually marked READABLE/EXECUTABLE. The MS Windows Task Manager monitors program execution and assures that the code stays unmodified (non-writable) for as long as the process runs. OTOH data sections are usually marked as READABLE/WRITABLE and their content can be freely modified as per the process' intended purposes. At the same time, the Task Manager and DEP services will assure that data within these sections isn't used as executable code.
While it is possible to create the so called "self-modifiable" code and change section attributes at run time, such activities will be considered as extremely suspicious from the AV perspective. That's why it is highly unlikely that the language authors would dare risk their reputation and ignore triggering the AV software even if these are 100% false alarms.
This isn't the case with interpretative languages/modes of operation. Here the executable code of the language's virtual machine stays unmodified at all times but what it really does is entirely dependent on the program data that defines the chain of commands (tokens) which the virtual machine is supposed to execute. Naturally enough, this data (and consequently, the execution flow) is totally reconfigurable (re-writable) at run time while the Task Manager, DEP, and AV SW stay unawares. Good virtual machines can do (albeit somewhat slower) everything that static code can, and much much more. Look at such products as VMware, VirtualBox, Virtual PC etc. - these are all virtual machines even more powerful than individual interpretative languages. They can emulate entire workstations including hardware, operating systems, and a plethora of different processes running in them, all in one.
This is one of the unbeatable and decisive advantages of interpretative languages/modes of operation over their static, and therefore restricted, compiler-only counterparts. :)
-
Compiled code goes into the process memory sections that are usually marked READABLE/EXECUTABLE. The MS Windows Task Manager monitors program execution and assures that the code stays unmodified (non-writable) for as long as the process runs. OTOH data sections are usually marked as READABLE/WRITABLE and their content can be freely modified as per the process' intended purposes. At the same time, the Task Manager and DEP services will assure that data within these sections isn't used as executable code.
While it is possible to create the so called "self-modifiable" code and change section attributes at run time, such activities will be considered as extremely suspicious from the AV perspective.
[...]
This is one of the unbeatable and decisive advantages of interpretative languages/modes of operation over their static, and therefore restricted, compiler-only counterparts. :)
Thanks Mike: valuable clarification to me.
But now a question knocks at my poor mind:
how can Oxygen generate and run machine-code?
Also, in NewLISP, I saw examples of dynamically generated (and executable) machine-code too.
Machine code is freeely runnable/writable is it stands in the heap?
-
Hi Paulo,
'creating and calling machine code
'using an array of bytes
'mov eax, 0x04030201 : ret
byte b={0xb8,1,2,3,4,0xc3}
print hex call @b
-
Morning Charles,
byte b={0xb8,1,2,3,4,0xc3}
print hex call @b
Not possible with your DEP turned on: access denied exception.
Hi Paolo,
Machine code is freeely runnable/writable is it stands in the heap?
how can Oxygen generate and run machine-code?
Also, in NewLISP, I saw examples of dynamically generated (and executable) machine-code too.
No, machine code doesn't reside in the heap. The memory image of an executable file is (very roughly and generally) divided into 4 parts:
-- program header that stores pointers to and sizes of program code, data, and resource sections of the program in the process memory;
-- code section(s) proper marked by the OS as non-modifiable (read/execute-only) memory areas;
-- data section(s) proper marked by the OS as modifialble (readable and writable) but non-executable memory areas; and
-- optional resource section that carries various icons, images, menus, dialog templates, strings etc. that the program may need for its purposes.
Apart from that, the program loader reserves two big chunks of memory that the program can use additionally outside its data section(s) in case it needs to create and destroy dynamic objects or pieces of data:
-- stack;
-- heap.
The stack is generally used for passing function parameters and allocating temporary pieces of data local to these functions. The heap is used to create both global and local dynamic objects such as arrays, strings (actually arrays of bytes), compound structures, and class instances. Both of these chunks are also usually marked as modifiable (readable and writable) but non-executable.
As you see, each process memory piece has a strictly predefined purpose and an associated expected behavior. Most user programs fit very well into this pattern but not all. Such programs as just-in-time compilers e.g. like OxygenBasic, FBSL, LuaJIT and some others will also need memory to dynamically generate and write additional executable code into. In this case the developer may use the VirtualAlloc/VirtualProtect/VirtualFree WinAPI's with a required set of readable/writable/executable attributes. These API's create and destroy additional data or code sections with an arbitrary mixture of attributes in the process heap chunk.
Unfortunately, this is exactly what most of the existing virii would do too for their malicious and destructive activities. Therefore, anti-viral software is taught to sniff unusual attribute combinations (mainly such as concurrent writable and executable attributes), and to also monitor the VirtualAlloc calls.
Since anti-viral protection is considered more important than presumption of the developer's innocence, a lot of less intelligent AV packages (abundant as such sites as e.g. the notorious VirusTotal.com) invariably flag JIT compilers as suspicious/malicious software. Hehe, and it is a serious challenge to outsmart the VirusTotal.com bunch of shitcode AV software to avoid false alarms. ;)
OTOH more intelligent packages with well-developed heuristics engines such as e.g. Kaspersky, Norton, Ezet and a few others would never flag a JIT compiler suspicious.
So there are 3 conceivable ways to break this vicious circle for a JIT compiler project:
-- hire a guru that's smart enough to disguise your "suspicious" activities against the stupidity of shitcode AV's;
-- disclose your sources to the world and pay off the VirusTotal.com gang to include your JIT compiler on their exception list; and
-- subside forever to the turtle pace of fully interpreted code which does not create writable/executable sections in the process heap at all.
Obviously, "Mr Burger from PicoLisp" has chosen behavioral pattern number 3 from this list.
My own indisputable option is however pattern number 1. :)
-
Avoiding DEP issues.
Embedded / Inline machine code:
o2 machine script enables the programmer to access the lowest (and original) level of OxygenBasic.
'creating and calling embedded machine code
'using o2 machine script
print hex call fx
end
fx:
'mov eax, 0x04030201 : ret
o2 b8 01 02 03 04 c3
-
I have a question......
i hope that is not to stupid....
For example Ed toy interpreter is a bytecode ....right?
but he don't produce bytecode with byte type of variable ...right?
than integer array.....
so is it possible to force this thing to produce byte array
or is this idea nonsence ?
thanks
-
Regretfully this isn't so, Charles.
Please have a look at the picture below. This is a failed attempt to run your script in a JIT mode pressing F5 in the IDE with DEP on:
1. MS Data Execution Prevention notifying me that gxo2 has been closed by the system to protect my computer.
2. An invitation to add gxo2 to the DEP exception list (i.e. allow it to execute its data as code).
3. A usual MS apology for closing gxo2.exe in case I reject adding it to the exception list as per Item 2.
FBSL avoids this mess by allocating heap buffers for its DynAsm and DynC JIT code using the VirtualAlloc/VirtualProtect/VirtualFree WinAPI family.
:(
.
-
Charles,
Can you please move both Aurel's question and my answer to the Bytecode Interpreter Concept thread, please?
Hi Aurel,
No, the bytecode commands (op_halt, op_push_int, etc) are actually stored in the code_arr[] array together with the variables' values according to the keywords and operators (sym_print, sym_while, sym_or, sym_and, etc) and literal numeric and string values as they are seen immediately by the parser. The notion of a particular "variable" and its "type" (only integer for now) exists only as long as its declaration is being evaluated by the parser() function, and its storage space is reserved as a location (a "cell") in the same code_arr[] array (or out of it, if you like). They are referred to by the interpreter() function later on by their "index" number in such bytecode commands as push_int_var X and store X where X is the "index" of the location (n=1, lim=2, k=3, p=4).
See Code listing... in your console also for such bytecode command as push_int X which means "push literal value X" against push_int_var X which means "push value stored at location that has index number X". store X always means "store result of current operation at location that has index number X".
The interpreter() function doesn't know anything about either "variables" or their "types". It deals only with the bytecoded commands and the nearby literal numeric and string pointer values stored in the same code_arr[] array. It also uses numeric literal values stored at the locations indexed 1 to 4 or saves literal results of numeric operations in them.
It "pushes" the literals into the auxiliary "stack" array stack[] and applies arithmetic operations to its elements that have just been "pushed". Each bytecode corresponds to a certain arithmetic or print operation that is applied by the interpreter to the stack[] array elements "pushed" into it. It also retrieves and stores values from/to the locations (former "variables") indexed 1 to 4. In other words, the interpreter() function is the code executor function.
Your own interpreter uses 3 arrays. The toy interpreter uses only one "interleaved" bytecode/literal array code_arr[] and a small auxiliary "stack" array stack[] to "de-interleave" (separate) the literals from the bytecode/literal code_arr[] array.
The toy interpreter uses a lot fewer resources so far than your own implementation. :)
-
Mike,
The forum system does not seem to provide a way of moving individual messages unfortunately.
Thanks for your info on DEP and VirtualAlloc.. In JIT mode, all is Data Execution :)
Does this have the correct protocol for making dynamic code acceptable to DEP ? If so then I will use it for Oxygen JIT compiling.
'http://msdn.microsoft.com/en-us/library/windows/desktop/aa366781(v=vs.85).aspx
'http://msdn.microsoft.com/en-us/library/windows/desktop/aa366887(v=vs.85).aspx
extern lib "kernel32.dll"
! VirtualAlloc (sys lpAddr,dwSize,flAllocationType,flProtect) as sys 'lpAddr
! VirtualProtect (sys lpAddr,dwSize,flNewProtect,*lpflOldProtect) as sys 'bool
! VirtualFree (sys lpAddr,dwSize,dwFreeType) as sys 'bool
end extern
% MEM_COMMIT 0x1000
% MEM_RESERVE 0x2000
'Memory Protection Constants
'http://msdn.microsoft.com/en-us/library/windows/desktop/aa366786(v=vs.85).aspx
% PAGE_EXECUTE 0x10
% PAGE_EXECUTE_READ 0x20
% PAGE_EXECUTE_READWRITE 0x40
'http://msdn.microsoft.com/en-us/library/windows/desktop/aa366892(v=vs.85).aspx
% MEM_DECOMMIT 0x4000
% MEM_RELEASE 0x8000
sys p,size,prot
size=0x1000
p=VirtualAlloc 0,size,MEM_COMMIT,PAGE_EXECUTE_READWRITE
byte b at p
b={0xb8,1,2,3,4,0xc3}
VirtualProtect p,size,PAGE_EXECUTE_READ,prot
print hex call p
VirtualFree p,size,MEM_DECOMMIT
[/code
-
In this case the developer may use the VirtualAlloc/VirtualProtect/VirtualFree WinAPI's with a required set of readable/writable/executable attributes.
Ok, now I see.
Thanks Mike for ALL your infos, I'm in debt.
Smalltalk allows the program to change op-codes on-the-flight.
Java doesn't not, but it allows to provide your security policy via ClassLoaders.
Same thing in Win32:
the guru you hires still needs a security layer on top of Win32,
i.e.: he must be sure the executables are not patched before loading them.
-
Yes Charles,
1. This effectively covers the entire protocol recommended by MS to be followed in order to bypass the DEP restrictions. Just one more very subtle issue is that there's also a device in a Pentium CPU called "the instruction cache" that can buffer repeatable instruction sequences e.g. such as those found in short loops. So if such a sequence is currently located in the instruction cache while the associated memory page's access attributes are being changed by a call to VirtualProtect(), the access attributes of cached instruction context will not change. To resolve this situation, a call to FlushInstructionCache() is recommended before modifying the attrbutes of a page that currently has an EXECUTE attribute set.
2. While the protocol is correct, what you have in your script is an attempt to cope with the problem on the client side of the language interface. This most probably won't work and actually your new script is blocked by DEP just as its predecessor was.
This may mean that the current implementation of gxo2.exe still uses some internal permutations that fall outside the scope of memory area preallocated with VirtualAlloc(EXECUTE_READWRITE). The protocol should also be implemented within the gxo2 engine proper and its sources should ensure compilation/recompilation is performed strictly within the preallocated executable memory area.
FBSL uses memory pools for all its internal allocations so MEM_COMMIT/MEM_DECOMMIT calls can appear only at the pool level transparently to the user. Consequently, explicit client-side VirtualAlloc/VirtualFree need to be used very rarely and only in the user code that would operate on the user pointers outside the scope of internal memory pools, for example, when self-modifiable dynamically recompiled user functions are being developed.
3. There's also a simpler and dirtier hack that uses standard memory allocations (from custom-coded allocators or general-purpose malloc alike). It simply uses VirtualProtect(PAGE_EXECUTE_READWRITE) to change memory attributes only once after compilation/recompilation and before the dynamic code is to be executed for the first time. To tell you the terrible military secret, that's exactly what FBSL uses in its DynAsm and DynC JIT compilers. :D
-
Hi Paolo,
Smalltalk allows the program to change op-codes on-the-flight.
AFAIK Smalltalk is an interpretative language. And I'm stressing once again that interpreters, both sequential and bytecoded ones, don't have any problems with DEP. This is because the actions they are supposed to perform when the program runs are predefined in the functions hardcoded in their engine (i.e. the language binary), specifically in the engine's code section(s). This code is never modified once the engine binary is loaded in memory.
What we usually call "code" for an interpreter is not actually executable machine code at all. It is just a sequence of readable/writable (i.e. modifiable) data, either a verbatim script like in thinBasic or a sequence of bytecodes (i.e. numeric literals representing human words in the initial verbatim script) like in FBSL's BASIC or Smalltalk or Java. This data sequence is exactly what is called "an interpreter program". It simply tells the interpreter which of its predefined functions to execute to perform the actions needed as the interpreted "program" runs.
JIT compilers like O2 and FBSL's DynAsm and DynC do generate genuine executable machine code on the fly. This is why they may be succeptible to DEP restrictions if Charles Pegge and Mike Lobanovsky don't take precautions to guard their respective users against Microsoft's encroachment on their civil rights. :)
Java doesn't not, but it allows to provide your security policy via ClassLoaders.
Since the interpreted "program" is simply readable/writable data but not executable machine code, there is nothing that prevents us theoretically from changing it whenever we want. Smalltalk does allow it. So does FBSL's BASIC through its ExecLine() function which allows the user to modify the existing "code" already loaded in memory or to add new "code" to the "code" that's already running. Java however doesn't allow it presumably to preclude the development of malware "code" from scratch or the modification of "code" already running, for ill-intended purposes.
the guru you hires still needs a security layer on top of Win32,
i.e.: he must be sure the executables are not patched before loading them.
Actually there are a lot of methods for the developer to protect their creation against undesirable or ill-minded modification. For example, the integrity of such parts in an FBSL executable as its "code" (script) stub or user resource data can be protected with checksums. The entire .exe file is also protected with its own checksum written into its standard PE header though of course this "protection" is the weakest of all and it can be easily bypassed by malware in a carelessly written virus-unaware piece of user code.