Author Topic: Talking JIT  (Read 9495 times)

0 Members and 2 Guests are viewing this topic.

Mike Trader

  • Guest
Talking JIT
« on: October 06, 2010, 12:14:52 AM »
Why is Oxygen a JIT compiler?

"A common goal of using JIT techniques is to reach or surpass the performance of static compilation, while maintaining the advantages of bytecode interpretation: Much of the "heavy lifting" of parsing the original source code and performing basic optimization is often handled at compile time, prior to deployment: compilation from bytecode to machine code is much faster than compiling from source. The deployed bytecode is portable, unlike native code. Since the runtime has control over the compilation, like interpreted bytecode, it can run in a secure sandbox. Compilers from bytecode to machine code are easier to write, because the portable bytecode compiler has already done much of the work."
http://en.wikipedia.org/wiki/Just-in-time_compilation

But I suspect there is a good deal of foreward thinking here (Mac, Linux, Java AVM...)?

Charles Pegge

  • Guest
Re: Talking JIT
« Reply #1 on: October 06, 2010, 01:33:28 AM »

Welcome to the forum Mike!

I use the term JIT very broadly in that Oxygen can go from source code directly into  executable memory image. Another slightly different JIT  thing it can do during run time is take a string of source code, perform a secondary compile, and run it on-the-fly sharing the same workspace.

In practice the compilation speed is fast enough for small applications and tools to run directly from source code, making PE file generation unnecessary.

Oxygen also supports some features of byte-code languages such as loose typing of variables. I dont think there is any theoretical limit on how smart you can make a compiler.

As for sandboxes, O2 is a bit hardcore at present, but as the architecture matures, introducing new features to make coding easier on the knees won't be a problem.

Oxygen could also be embedded as a compile engine for another loanguage.

Charles

efgee

  • Guest
Re: Talking JIT
« Reply #2 on: October 06, 2010, 10:39:25 AM »
One thing that also could be accomplished is to have a debugger window linked into the in-memory compiled program to look at the variable values at run-time; maybe even with the ability to change them on the fly.

Charles Pegge

  • Guest
Re: Talking JIT
« Reply #3 on: October 06, 2010, 08:29:58 PM »

This would be very useful for high level debugging. I thought of using the Windows Shared-Memory mechanism which allows processes to share a named memory area. This is more efficient than sharing a file and would allow a monitoring tool to be independently loaded and link in with the program under test conditions.

For debugging at the lower level the compiler could be switched to a limit-checking mode. This would insert code to verify array indexes are within bounds, pointers remain within their  prescribed memory area. and variables remain within a preset limit. (You would not want this extra code in your finished product).

But with the sort of debugging I am faced with, the main issue is minimising the amount of information generated. One needs to trap events very precisely. The only tools I use for debugging the compiler are the message box and low cunning :)

Charles

Mike Trader

  • Guest
Re: Talking JIT
« Reply #4 on: October 08, 2010, 01:32:18 AM »
Yeah, there was a big debate in the C++ forums about the usefulness of debuggers. It seems seasoned guys can do more faster with well placed print statements. I must admit I use this technique for all my development work, even multithreaded debugging which can get very tricky with multiple instances of FastCGI apps running at the same time. I have no idea how you would use a debugger in that case.

Some events can't easily be simulated, like say a PayPal payment notification. Their sandbox does not really do what the live mode does, so you just have to wait for the real event and then go back and look through the debug spew.

But back to JIT.
(I love the flexibility of the run time string compile)
Does the initial compilation produce byte code? I am wondering if you are leaving that possibility open to run on other platforms?


Charles Pegge

  • Guest
Re: Talking JIT
« Reply #5 on: October 08, 2010, 04:30:38 AM »

That's interesting Mike. I have never used a debugger because I could never envisage how they could be configured to trap the bugs I was after. They are often highly circumstantial and a print message is versatile enough to catch them. But I am experimenting with a few extra metacommands to aid compile-time checking and debugging. More on this later.

On your question about byte code, the answer is no - at least not yet, the first stage translates directly into x86 Assembler.

To parody Lewis Carrol, Oxygen compiling is an agony of four fits


Fit the first:

A combination of preprocessor and high level language translation into Assembler source code.


Fit the Second:

Assembler source code is translated into o2 machine script


Fit the Third:

O2 machine script is converted into executable binary


Fit the Fourth:

The executable binary image is mapped and condensed into a PE file (or ELF file in the future for Linux).
This stage is not needed for direct execution of the binary.



The outputs from each of these stages is extractable. And I see benefits in providing a byte machine between fit the first and fit the second. Then it would be easy to emit say ARM assembler or even C instead of x86 Asm.

Charles

JRS

  • Guest
Re: Talking JIT
« Reply #6 on: October 08, 2010, 09:16:50 PM »
Quote from: Charles
But with the sort of debugging I am faced with, the main issue is minimising the amount of information generated. One needs to trap events very precisely. The only tools I use for debugging the compiler are the message box and low cunning

Check this debugger out and see if you can use it with O2h.

http://www.ollydbg.de/

Code: [Select]
' THE HELLO PROGRAM

#basic
print `Hello!`
terminate

[attachment deleted by admin]
« Last Edit: October 09, 2010, 05:27:18 PM by JRS »

Charles Pegge

  • Guest
Re: Talking JIT
« Reply #7 on: October 08, 2010, 10:31:42 PM »

That is impressive John! The main problem is it produces too much fine grained data. Actually Oxygen can display the assembly code of any section of code, large or small. and I have macros to display the register content. I'll include them in the tool section of the next alpha. Usually it is sufficient just to print the Basic variables, though it is sometimes informative to inspect the FPU stack.

You can try this in Oxygen. It will work with any of the examples using the Scit IDE.


Choose the lines you want to inspect.

Insert ### above and below the block of lines.

Compile the code with Ctrl-F7

Enlarge the lower panel and you should see the Assembly code listing for the ### marked section.



Another trick for looking at any cpu register:

Supposing you wanted to check out the EBX register at a particular point in the program:

Code: [Select]
sys a=EBX
print  hex a

You can even take a snapshot of the EAX register which is extremely volatile since almost every operation uses it:

Code: [Select]
sys a
'...
push EAX 'save contents before EAX is used for anything else
pop a 'pop value directly into variable A
'...
print hex a

Charles

JRS

  • Guest
Re: Talking JIT
« Reply #8 on: October 09, 2010, 06:59:16 AM »
Quote
The main problem is it produces too much fine grained data.

I noticed this debugger being mentioned on the PowerBASIC forum. Someone wrote a plug-in for PB and allows the PB source to be incorporated into the debugger.

Sounds like you have a foundation for a debugger already in place with O2h.
« Last Edit: October 09, 2010, 05:27:45 PM by JRS »

o2admin

  • Administrator
  • *****
  • Posts: 21
  • OxygenBasic
    • Oxygen Basic
Re: Talking JIT
« Reply #9 on: October 09, 2010, 11:28:52 AM »
This is the listing you get for the line print "hello". (Ctrl-F7)

Machine script on the left, Assembly code on the right. Notice that the "Hello" is not present. This is placed in the data area and the linker will later pick up the "gc 1 " reference and fill in the offset address.

Code: [Select]
###
print "Hello"
###

If I have done my job properly, you will never need to debug to this level of detail. But it is very useful when working on the compiler itself.


Code: [Select]
>"C:\cevp\projects\opcode\OxygenBasic\gxo2" " -a -c C:\cevp\projects\opcode\OxygenBasic\t.bas"
 8D 83 gc 1                     ; 
 FF 93 68 08 00 00              ;  call [ebx+2152]
 FF 93 78 08 00 00              ;  call [ebx+2168]
 33 C0                          ;  xor eax,eax
 FF 93 88 08 00 00              ;  call [ebx+2184]
 50                             ;  push eax
 FF 93 C0 09 00 00              ;  call [ebx+2496]
 50   

Okay
>Exit code: 0


Charles

JRS

  • Guest
Re: Talking JIT
« Reply #10 on: October 09, 2010, 06:10:47 PM »
If I haven't said this more than once, using C header files is a dream come true in a Basic language.

Nice job Charles!

I also appreciate the trend toward a typeless variable and relaxed bracketing/comma requirements.

I hope SB has been an inspiration in some form or another.

« Last Edit: October 09, 2010, 06:15:37 PM by JRS »

Charles Pegge

  • Guest
Re: Talking JIT
« Reply #11 on: October 10, 2010, 09:34:35 AM »

Well I am exploring the possibility of offering truly typeless variables as you would find in ScriptBasic. It blurs the distinction between scripting languages and compilers. Variants would be a good design template.

Associative arrays are interesting too. I think these, along with dynamic arrays could be efficiently handled by the OOP infrastructure. These would look like normal arrays but the compiler turns them into Array objects with various methods for index resolution and bounds checking.

An associative array essentially uses keywords instead of conventional indexes. For instance:

dbase{"fred","phone"}="0123 456789"

(Scriptbasic uses curly braces to distinguish associative arrays from indexed ones.)

Charles

JRS

  • Guest
Re: Talking JIT
« Reply #12 on: October 10, 2010, 10:31:13 AM »
ScriptBasic allows a mix of both numeric and associative array indexing methods when building complex structures. What I like is assignments can be made to any element without dimensioning or defining it's constraints first. The LBOUND/UBOUND functions keep track of bounds as element assignments are arbitrarily made. Unassigned elements when referenced return undef as the value.

SB Array Tutorial
« Last Edit: October 10, 2010, 12:01:22 PM by JRS »

Charles Pegge

  • Guest
Re: Talking JIT
« Reply #13 on: October 10, 2010, 02:57:00 PM »

This kind of flexibility is certainly do-able for a compiler but comes at a high price in terms of performance. It would still be an improvement over scripting, and you can at least start out with a loosely defined idea, later to firm up for faster execution and efficiency.

For instance once you have established a well defined set of index keys, they can be turned into enumerations and the array can be given a fixed size.

Charles

JRS

  • Guest
Re: Talking JIT
« Reply #14 on: October 10, 2010, 04:33:17 PM »
Quote
This kind of flexibility is certainly do-able for a compiler but comes at a high price in terms of performance.

Link list array management is a performance hit for sure. It's the slowest function in SB I have found so far. Once arrays are created though, access is rather quick as if they were predefined.