Oxygen Basic
Information => Reference => Topic started by: Mike Trader on October 06, 2010, 12:14:52 AM
-
Why is Oxygen a JIT compiler?
"A common goal of using JIT techniques is to reach or surpass the performance of static compilation, while maintaining the advantages of bytecode interpretation: Much of the "heavy lifting" of parsing the original source code and performing basic optimization is often handled at compile time, prior to deployment: compilation from bytecode to machine code is much faster than compiling from source. The deployed bytecode is portable, unlike native code. Since the runtime has control over the compilation, like interpreted bytecode, it can run in a secure sandbox. Compilers from bytecode to machine code are easier to write, because the portable bytecode compiler has already done much of the work."
http://en.wikipedia.org/wiki/Just-in-time_compilation (http://en.wikipedia.org/wiki/Just-in-time_compilation)
But I suspect there is a good deal of foreward thinking here (Mac, Linux, Java AVM...)?
-
Welcome to the forum Mike!
I use the term JIT very broadly in that Oxygen can go from source code directly into executable memory image. Another slightly different JIT thing it can do during run time is take a string of source code, perform a secondary compile, and run it on-the-fly sharing the same workspace.
In practice the compilation speed is fast enough for small applications and tools to run directly from source code, making PE file generation unnecessary.
Oxygen also supports some features of byte-code languages such as loose typing of variables. I dont think there is any theoretical limit on how smart you can make a compiler.
As for sandboxes, O2 is a bit hardcore at present, but as the architecture matures, introducing new features to make coding easier on the knees won't be a problem.
Oxygen could also be embedded as a compile engine for another loanguage.
Charles
-
One thing that also could be accomplished is to have a debugger window linked into the in-memory compiled program to look at the variable values at run-time; maybe even with the ability to change them on the fly.
-
This would be very useful for high level debugging. I thought of using the Windows Shared-Memory mechanism which allows processes to share a named memory area. This is more efficient than sharing a file and would allow a monitoring tool to be independently loaded and link in with the program under test conditions.
For debugging at the lower level the compiler could be switched to a limit-checking mode. This would insert code to verify array indexes are within bounds, pointers remain within their prescribed memory area. and variables remain within a preset limit. (You would not want this extra code in your finished product).
But with the sort of debugging I am faced with, the main issue is minimising the amount of information generated. One needs to trap events very precisely. The only tools I use for debugging the compiler are the message box and low cunning :)
Charles
-
Yeah, there was a big debate in the C++ forums about the usefulness of debuggers. It seems seasoned guys can do more faster with well placed print statements. I must admit I use this technique for all my development work, even multithreaded debugging which can get very tricky with multiple instances of FastCGI apps running at the same time. I have no idea how you would use a debugger in that case.
Some events can't easily be simulated, like say a PayPal payment notification. Their sandbox does not really do what the live mode does, so you just have to wait for the real event and then go back and look through the debug spew.
But back to JIT.
(I love the flexibility of the run time string compile)
Does the initial compilation produce byte code? I am wondering if you are leaving that possibility open to run on other platforms?
-
That's interesting Mike. I have never used a debugger because I could never envisage how they could be configured to trap the bugs I was after. They are often highly circumstantial and a print message is versatile enough to catch them. But I am experimenting with a few extra metacommands to aid compile-time checking and debugging. More on this later.
On your question about byte code, the answer is no - at least not yet, the first stage translates directly into x86 Assembler.
To parody Lewis Carrol, Oxygen compiling is an agony of four fits
Fit the first:
A combination of preprocessor and high level language translation into Assembler source code.
Fit the Second:
Assembler source code is translated into o2 machine script
Fit the Third:
O2 machine script is converted into executable binary
Fit the Fourth:
The executable binary image is mapped and condensed into a PE file (or ELF file in the future for Linux).
This stage is not needed for direct execution of the binary.
The outputs from each of these stages is extractable. And I see benefits in providing a byte machine between fit the first and fit the second. Then it would be easy to emit say ARM assembler or even C instead of x86 Asm.
Charles
-
But with the sort of debugging I am faced with, the main issue is minimising the amount of information generated. One needs to trap events very precisely. The only tools I use for debugging the compiler are the message box and low cunning
Check this debugger out and see if you can use it with O2h.
http://www.ollydbg.de/
' THE HELLO PROGRAM
#basic
print `Hello!`
terminate
[attachment deleted by admin]
-
That is impressive John! The main problem is it produces too much fine grained data. Actually Oxygen can display the assembly code of any section of code, large or small. and I have macros to display the register content. I'll include them in the tool section of the next alpha. Usually it is sufficient just to print the Basic variables, though it is sometimes informative to inspect the FPU stack.
You can try this in Oxygen. It will work with any of the examples using the Scit IDE.
Choose the lines you want to inspect.
Insert ### above and below the block of lines.
Compile the code with Ctrl-F7
Enlarge the lower panel and you should see the Assembly code listing for the ### marked section.
Another trick for looking at any cpu register:
Supposing you wanted to check out the EBX register at a particular point in the program:
sys a=EBX
print hex a
You can even take a snapshot of the EAX register which is extremely volatile since almost every operation uses it:
sys a
'...
push EAX 'save contents before EAX is used for anything else
pop a 'pop value directly into variable A
'...
print hex a
Charles
-
The main problem is it produces too much fine grained data.
I noticed this debugger being mentioned on the PowerBASIC forum. Someone wrote a plug-in for PB and allows the PB source to be incorporated into the debugger.
Sounds like you have a foundation for a debugger already in place with O2h.
-
This is the listing you get for the line print "hello". (Ctrl-F7)
Machine script on the left, Assembly code on the right. Notice that the "Hello" is not present. This is placed in the data area and the linker will later pick up the "gc 1 " reference and fill in the offset address.
###
print "Hello"
###
If I have done my job properly, you will never need to debug to this level of detail. But it is very useful when working on the compiler itself.
>"C:\cevp\projects\opcode\OxygenBasic\gxo2" " -a -c C:\cevp\projects\opcode\OxygenBasic\t.bas"
8D 83 gc 1 ;
FF 93 68 08 00 00 ; call [ebx+2152]
FF 93 78 08 00 00 ; call [ebx+2168]
33 C0 ; xor eax,eax
FF 93 88 08 00 00 ; call [ebx+2184]
50 ; push eax
FF 93 C0 09 00 00 ; call [ebx+2496]
50
Okay
>Exit code: 0
Charles
-
If I haven't said this more than once, using C header files is a dream come true in a Basic language.
Nice job Charles!
I also appreciate the trend toward a typeless variable and relaxed bracketing/comma requirements.
I hope SB has been an inspiration in some form or another.
-
Well I am exploring the possibility of offering truly typeless variables as you would find in ScriptBasic. It blurs the distinction between scripting languages and compilers. Variants would be a good design template.
Associative arrays are interesting too. I think these, along with dynamic arrays could be efficiently handled by the OOP infrastructure. These would look like normal arrays but the compiler turns them into Array objects with various methods for index resolution and bounds checking.
An associative array essentially uses keywords instead of conventional indexes. For instance:
dbase{"fred","phone"}="0123 456789"
(Scriptbasic uses curly braces to distinguish associative arrays from indexed ones.)
Charles
-
ScriptBasic allows a mix of both numeric and associative array indexing methods when building complex structures. What I like is assignments can be made to any element without dimensioning or defining it's constraints first. The LBOUND/UBOUND functions keep track of bounds as element assignments are arbitrarily made. Unassigned elements when referenced return undef as the value.
SB Array Tutorial (http://www.scriptbasic.org/forum/index.php/topic,170.0.html)
-
This kind of flexibility is certainly do-able for a compiler but comes at a high price in terms of performance. It would still be an improvement over scripting, and you can at least start out with a loosely defined idea, later to firm up for faster execution and efficiency.
For instance once you have established a well defined set of index keys, they can be turned into enumerations and the array can be given a fixed size.
Charles
-
This kind of flexibility is certainly do-able for a compiler but comes at a high price in terms of performance.
Link list array management is a performance hit for sure. It's the slowest function in SB I have found so far. Once arrays are created though, access is rather quick as if they were predefined.
-
>Usually it is sufficient just to print the Basic variables, though it is sometimes informative to inspect the FPU stack.
Exactly right.
I must say I am a big fan of weakly typed languages. http://en.wikipedia.org/wiki/Weak_typing (http://en.wikipedia.org/wiki/Weak_typing)
One of the biggest pains in C++ is casting variables in expressions. A developer should know what to expect when he assigns a DWORD to an INT for example.
-
I think the idea is to store data as bytes and determine type on use. There are few times I had to use a VAL() with ScriptBasic.
a = "1"
b = 2
PRINT a + b
Would return 3 as the + operator tells SB that we are dealing in numbers with the stored values. After you play with SB for awhile, it all falls into place.
-
@Mike,
The Windows32 SDK bears the scars of excessive type specification. Dealing with these makes type matching more complicated but I hope it should be easy enough to lift hunks of C code and headers and get them to work in Oxygen. When I come across succinct examples from MSDN I will include them with the examples.
To override a type and get the integer value directly you can use "?" as a prefix. Example: ? MyBstr will yield the integer value of the BSTR (a pointer) instead of its string content.
@John.
So far I've got automatic value to string conversion but I have not been so successful going from string to value.
This is largely due to function overloading rules, which allow functions of the same name but different prototypes to be supported. The compiler cannot always make the correct decision which function to use if it is over-enthusiastic about type conversion. Perhaps String-to-number conversion could be attempted as a last resort. This is where things a not so complex for ScriptBasic as it is able to make such decisions at run-time based on the string content.
Charles
-
So far I've got automatic value to string conversion but ...
So O2h will now allow the following?
a = "1"
b = 2
PRINT a & b
Results: "12"
-
Yes exactly.
One further difference: in Oxygen & and + are interchangeable. And when variables are juxtaposed without operators then addition or concatenation is assumed.
Currently these are equivalent:
print a & b
print a + b
print a b
-
One further difference: in Oxygen & and + are interchangeable.
If that's the case, how does O2h know if your adding the numbers or concatenating as a string?
-
If you are assigning to a string then each of the numeric terms will be converted to strings. To ensure that numbers are added together rather than concatenated when strings are present you need to put them inside an explicit str().
In this example str does not need brackets as it is the final function in the statement.
print "Result " str 1 2 3
Result 6
I should also clarify thet when & is used with strings it is a concatenator but when applied to numbers it uses the C interpretation and performs a bitwise and
-
To ensure that numbers are added together rather than concatenated when strings are present you need to put them inside an explicit str().
In this example str does not need brackets as it is the final function in the statement.
print "Result " str 1 2 3
Result 6
I would of thought the Result would be "123" not 6 if the STR() forces string concatenation. It just seems backwards to me.
I think the standard + for math and & for concatenation (or bitwise AND) is more readable. I would hate to see O2h get too abstract with simplification and multifunctional operators where you have to guess what is going on.
-
Unfortunately & has too many meanings for comfort and Oxygen has to conform to C usage as well as accommodate string concat and hex &h numbers in basic
Str always takes a numeric argument so str(1 2 3) or str(1+2+3) will add the numbers as expected and return them in string format.
The programmer has the choice whether to use the abbreviated forms of expression or use the full syntax.
It all depends on getting the clearest most readable code. The resulting binary will be the same.
PS: and or are still available and I prefer using them to & | && ||